GEOSTATISTICS AND REMOTE SENSING METHODS IN THE CLASSIFICATION OF IMAGES OF AREAS CULTIVATED WITH CITRUS

This study compares the precision of three image classification methods, two of remote sensing and one of geostatistics applied to areas cultivated with citrus. The 5,296.52ha area of study is located in the city of Araraquara – central region of the state of São Paulo (SP), Brazil. The multispectral image from the CCD/CBERS-2B satellite was acquired in 2009 and processed through the Geographic Information System (GIS) SPRING. Three classification methods were used, one unsupervised (Cluster), and two supervised (Indicator Kriging/IK and Maximum Likelihood/Maxver), in addition to the screen classification taken as field checking.. Reliability of classifications was evaluated by Kappa index. In accordance with the Kappa index, the Indicator kriging method obtained the highest degree of reliability for bands 2 and 4. Moreover the Cluster method applied to band 2 (green) was the best quality classification between all the methods. Indicator Kriging was the classifier that presented the citrus total area closest to the field check estimated by -3.01%, whereas Maxver overestimated the total citrus area by 42.94%.


INTRODUCTION
Brazilian citrus industry is one of the most efficient and dynamic of the world, responding quickly to changes in international demand.São Paulo is the state that has the largest area of citrus production in Brazil, where the domestic production of orange in 2011 was of approximately 19.655 million tons (IBGE, 2012).
The incorporation of geostatistical procedures in agronomic studies, based on kriging, is becoming more common.Thus, many professionals in remote sensing (PARDO-IGUZQUIZA et al., 2010;DUVEILLER & DEFOURNY, 2010;PARDO-IGUZQUIZA et al., 2011;MULDER et al., 2011) applied these techniques, especially in digital image classification for mappings of land use, based on semi detailed surveys.RIVERO et al. (2007) studied the total phosphorus in the soil, considered as the primary variable for kriging and cokriging using data from satellite imagery as secondary variable.They found that the method of multivariate cokriging was better to predict the total phosphorus in the soil compared to kriging, for this univariate method where only one variable is studied, is based solely on observed values of soil property, ignoring co spatial variability with other components of that ecosystem.The integration of remote sensing with geostatistics takes into account the spatial autocorrelation and covariance and thus substantially improves the quality of the modeling.
According to RIVERO et al. (2009) the limitation on the number and frequency of soil samples that can be taken, can be improved by combining remote sensing and geostatistics representing a noninvasive and less costly method to monitor soil nutrient in complex systems of wetlands.PARDO-IGUZQUIZA et al. (2011), when using the cokriging to perform the fusion of digital images, found that this methodology has shown that there is an increase in the accuracy of the image, and this was demonstrated by both the visual appearance of the image as the quantitative indexes that evaluate the image quality fused.ROSSI et al. (1994) used a geostatistical technique known as indicator kriging in a Landsat image in Chiapas, Mexico.First, the image was classified, according to the ground cover, as pasture and not pasture.For each pixel, which was obscured by clouds or by their shadows, where it was not possible to verify a pasture, the kriging provided the probability of having pasture (value 1) or not (value 0).Variographic analysis was used to characterize the spatial continuity of the image to use in kriging.
Kriging was also used to determine the potential health risks of arsenic contamination (FIGUEIRA et al., 2007), chemical pollution of soil (TAVARES et al., 2008) and to map the vegetation in western New York (WANG, 2007).
Knowing the soil use is essential to the agricultural economy and the definition of the implementation of new crops.Thus, there is a need for rapid and effective tools for mapping and quantification of crops, which is made possible with the use of remote sensing images associated with the use of efficient classification algorithms.
The aim of this study was to compare the accuracy of classification methods in the determination of orbital images of areas cultivated with citrus in the middle region of the municipality of Araraquara, state of São Paulo, Brazil, using traditional methods of remote sensing (Maxver and Cluster) and geostatistical (Indicator Kriging), for discrimination and quantification of cultivated area.

MATERIAL AND METHODS
The study area is located in the central region of the state of São Paulo, with 5296.52ha,covering the municipalities with the largest planted area of citrus in the mesoregion of Araraquara.The area lies between the geographical coordinates 21º54'19" and 21º50'54" of south latitude and 48º43'35" and 48º39'8" of west longitude, at an altitude ranging 447-580 meters.The predominant soil in the area was classified as Red Latosol (EMBRAPA, 2006).According to Köppen classification, the climate in the region is Cwa, humid subtropical climate (mesothermal) with humid summer and dry winter, and the average temperature of the warmest month is above 22ºC (CUNHA et al., 2006).The average annual rainfall is between 1,100 and 1,700mm.
To create the database and image processing, the Geographic Information Processing/SPRING system, version 5. 1.7 (INPE, 2010), was used, because it has all the necessary tools for the proposed analyzes.
Images were used in digital format from the CBERS 2B satellite, CCD -High Resolution Imaging Camera (CBERS, 2010), with a resolution of 20 meters.The bands used were 2, 3 and 4 corresponding to green, red and infrared, respectively.The images were obtained for free at the website of the National Institute for Space Research (http://www.dgi.inpe.br/CDSR/).In order to choose the satellite image, criteria of cloud shadows and visibility of the study site were used.Therefore, a picture from a dry period in the orbit/point 157/124 was sought, and one from 04/16/2009 was selected.
During the preprocessing, images were imported, and the georeferencing, image enhancement and image segmentation were performed, using the growth by regions segmented.The segmentation was performed so the cluster classifier could be applied, whereas for all other classifiers such processing is not necessary.For georeferencing of CBERS-2B images, Landsat 221/075, 2002, images were used, which has the WGS84 datum.The georeferenced Landsat image was obtained on www.landsat.org.
In the process of image classification, three methods were applied.One is unsupervised, Cluster (ISOSEG), and two are supervised, Indicator Kriging and Maximum Likelihood (Maxver), and a screen classification was also taken as reference.Classifications were made for bands 2, 3 and 4 separately.
Cluster and maximum likelihood classifiers were used because they are the more used in image classification and in order to compare with the proposed classifier, i.e., Indicator Kriging.
A visual classification was made to be used as a reference map in assessing the performance of digital images classifiers.The reference map of the study area was prepared by screen classification, which consists in the interpretation and delineation of the targets present in the spectral image are carried out directly by the analyst, subject to subjectivities.Since it is a laborious and time consuming classification method, other classifications should be studied to discriminate cultivated areas.Since the goal is to know only the area cultivated with citrus, two (2) classes of use were determined: citrus and non-citrus.
Kriging is a geostatistical interpolation method which uses variogram parameters, as per the equation: where: = values obtained by field sampling; and i λ = weights associated with the value weighted at the position x i .
The parameters of the variogram are: range (distance in which the variogram reaches the threshold, that is, it is the distance the variable presents spatial dependence), nugget effect (reflects the analytical error, indicating an unexplained variability from one point to another, which may be due either to errors in measurements or microvariations not detected due to the sampling distance), structural component (how much the variance depends on the distance), threshold (value in which the variogram is stabilized and is approximately equal to the variance of the data).
The variogram of the experimental data should be adjusted to the theoretical model, which can be spherical, exponential, Gaussian or linear.
The standard error of estimation is evaluated by cross-validation that measures quantitatively the variogram adjustment and errors arising from it in kriging, using the concepts defined by DAVIS (1987).
Indicator Kriging is applied after processing of the original values into binary data, i.e., 0 and 1, subject to one or more thresholds of interest.The indicators for each threshold are modeled by a structural function as indicator variogram and used by ordinary kriging to provide the estimated probability of exceeding or not limits of interest (BADEL et al., 2011).
For classification using Indicator Kriging, an image sampling was performed to obtain points containing UTM coordinates and the value of reflectance of bands 2, 3 and 4.
The version of the program used did not support the amount of data (20 in 20m), making it necessary to reduce the number of data.Tests were carried out and the maximum supported data by the program to perform the kriging interpolation was 85 in 85m.A thickening in the area of citrus was performed, yielding 7,216 points in an area of 5296.52ha.Then, the reflectance values were converted into binary values, coded as 1 for the presence of citrus and as 0 for non-presence, i.e., any other crop.
The adjustment of variograms, based on assumptions of stationarity of intrinsic hypothesis, was estimated as described by ROSSI et al. (1994), using the Matheron's classical variogram, adjusted to the digital number (ND) of an image: where: γ(h) represents half the sum of the square of the difference between the values of pixel pairs separated by the distance h vector.Variance γ(h) is a function dependent of the angle and distance of the h vector between a numbers of pairs of pixel values x i + h and x i .
The map obtained by Indicator Kriging shows the probability of occurrence of citrus areas and, therefore, areas with no citrus.
The maximum likelihood classifier assumes that the data of each class in each band are normally distributed, and calculates the probability that a given pixel belongs to a specific class.Each pixel is, thus, assigned to the class that has the highest probability (YANG et al., 2011).In the maximum likelihood classification, a context file containing the bands which are part of the classification process, using the method of pixel classification and samples from pixel to pixel and performed by the training samples over the image drawing area, was created.The validity of the samples was verified, aiming to get a confusion matrix with the main diagonal close to 100%; then, the classification is performed on the samples using the maximum likelihood, and the classes mapping was performed.
The unsupervised classifier (Cluster) is based on the method of multivariate statistical analysis of groupings.It is able to identify classes within an initial set of data, without a priori knowledge about the study area.To perform this classification, the first step was the image segmentation.This step was necessary to provide the similarity thresholds and area.After the segmentation was performed, the classifier ISOSEG of SPRING was applied, and, then, the mapping of the area was carried out.
The accuracy rate of the three focused auto ratings were obtained considering the outcome of each one with visual interpretation.The reliability of the ratings was assessed by Kappa index, for later comparison of the performance of different indexes in each classification.

RESULTS AND DISCUSSION
The result obtained by the visual interpretation map, representing the field checking, with plantations of citrus is in Figure 1.
The reflectance values for the citrus crop are in the range 26 to 29 for band 2, 23 to 28 for band 3 are 75 to 89 for band 4. Smaller and larger values than these ranges represent other types of ground cover, known as non-citrus.In the geostatistical analysis, the reflectance of bands 2, 3 and 4 were best fitted to the spherical model (Table 1 and Figure 2).The range of reflectance of the bands ranged from 1,379.48 to 1,553.43m,whereas in this radius the reflectance values are spatially dependent, and these ranges were considered in the interpolation process to obtain a final map with reliable values.
The spatial dependence index (SDI) of the reflectance of the three bands was moderate (25% < SDI > 75%), according to ZIMBACK (2001).The adjustments of the variograms were verified by cross-validation obtaining a Pearson correlation coefficient greater than 0.75 between the observed and predicted values, thus considering that the theoretical model adjusted to the experimental model can be used in the interpolation process by IK.Therefore, the fitted model, the range, the nugget effect and the threshold were used in the interpolation by IK, generating a map of probability of the citrus area.
After generating the map of the probability of occurrence of citrus, it was divided, whereas the value from 0 to 75% of probability was considered as non-citrus and above 75% of probability as citrus (Figure 3).Other intervals were evaluated, but the range of 75 % to 100% of probability as citrus was considered the best interval.
When relating IK with the maps of field checking, it was found that the band 4 (infrared) of IK was the one that best classified citrus areas compared to the other bands.The map generated by IK shows the limits of the areas of citrus smoothed and circular unlike other methods that shows the limits with angular areas and rough details.As Maxver is classifier that needs training samples, this method was able to determine satisfactorily the areas of citrus with an acceptance threshold of 99% (Figure 4).In the process of analysis of training samples, Maxver had good performance with an average confusion of 2.62% for band 2; 21.61% for band 3, and 18.79% for band 4, which justifies the small areas that were misclassified with citrus, whereas these areas were close to drainage channels.
By relating the maximum likelihood maps with the field checking map, it is found that band 2 (green) was the one that best classified citrus areas compared to the other bands.
In the unsupervised classifier Cluster (Figure 5), the citrus class was identified and grouped, with an acceptance threshold of 95%.It was observed that the reflectance of other uses had little influence in this process, because the areas of confusion were just a few in bands 2 and 4; on the other hand, in band 3 there was greater confusion regarding the use.When looking at the map of field checking and Cluster, it was found that the areas near the drainage channels were the ones that most confused the spectral values with the citrus, as occurred with Maxver.
The maps generated by the three digital classifiers showed that IK was the classification that did not present fragmented areas and lesser riparian areas classified as citrus.
The Kappa index considers all elements present in the classified map, instead of using only the diagonal elements or some points.The adjustments calculated and indexes for the classifiers are shown in Table 2.It was found that the result in the evaluation of quality of the classification was strong for bands 2 and 4 of IK classifier and for band 2 of Cluster classifier, with an index of 0.6546, 0.6226 and 0.7548, respectively.The other classifications of the bands had moderate classification quality.SANCHEZ et al. (2008) studied the discrimination of varieties of citrus with CCD CBERS-2 images using the GIS -SPRING, and found kappa index values lower than that found in this study for Maxver and Cluster classifiers. (c) Indicator Kriging was the method that best classified the areas of citrus in band 4, with strong agreement, as per kappa coefficients for bands 2 and 4.This proves that the IK classifier was the method that less confused other types of land use with the culture of citrus and that identified well areas of citrus.KI, as a supervised classifier, has an advantage in relation to Maxver in the training or sampling, because Maxver needs to make samples of training in all areas of citrus and of other uses of image.on the other hand, IK only needs to know the value of the reflectance of citrus in one area and, based on these values, applies the section for the binary transformation of data.
Band 2 (green) was the one with best quality rating by the three classifiers proving to be the band that represents the reflectance of vegetation.
Table 3 presents the quantification of the area planted with citrus by different classifiers.Classifier IK of band 3 showed the total area of citrus closer to field checking, with a difference in area percentage of -3.01%, i.e., it classified 64.25ha less of citrus than the study area really has.Maxver band 4 was the classifier that most overestimated the total area of citrus (42.94 %).TABLE 3. Quantification of citrus area by classifiers and differences of area.

Attribute
Citrus area Difference Difference in %

FIGURE 1 .
FIGURE 1. Citrus areas resulting from visual interpretation.

TABLE 2 .
Comparison of classifiers by Kappa Index.