UNCERTAINTIES IN THE PREDICTION OF SPATIAL VARIABILITY OF SOIL CO 2 EMISSIONS AND RELATED PROPERTIES ( 1 )

The soil CO2 emission has high spatial variability because it depends strongly on soil properties. The purpose of this study was to (i) characterize the spatial variability of soil respiration and related properties, (ii) evaluate the accuracy of results of the ordinary kriging method and sequential Gaussian simulation, and (iii) evaluate the uncertainty in predicting the spatial variability of soil CO2 emission and other properties using sequential Gaussian simulations. The study was conducted in a sugarcane area, using a regular sampling grid with 141 points, where soil CO2 emission, soil temperature, air-filled pore space, soil organic matter and soil bulk density were evaluated. All variables showed spatial dependence structure. The soil CO2 emission was positively correlated with organic matter (r = 0.25, p < 0.05) and air-filled pore space (r = 0.27, p < 0.01) and negatively with soil bulk density (r = -0.41, p < 0.01). However, when the estimated spatial values were considered, the air-filled pore space was the variable mainly responsible for the spatial characteristics of soil respiration, with a correlation of 0.26 (p < 0.01). For all variables, individual simulations represented the cumulative distribution functions and variograms better than ordinary kriging and E-type estimates. The greatest uncertainties in predicting soil CO2 emission were associated with areas with the highest estimated values, which produced estimates from 0.18 to 1.85 t CO2 ha-1, according to the different scenarios considered. The knowledge of the uncertainties generated by the different scenarios can be used in inventories

of greenhouse gases, to provide conservative estimates of the potential emission of these gases.Index terms: soil respiration, geostatistics, ordinary kriging, sequential Gaussian simulation, sugarcane.

INTRODUCTION
In 2005, agriculture accounted for emissions of 5.1 -6.1 Gt CO 2 -eq, representing 10-12 % of the global emissions and making it the second largest source of anthropogenic emissions of greenhouse gases (GHGs) (IPCC, 2007).However, due to the complexity and uncertainty involved in quantifying soil carbon (C) balance, in most studies, estimates of GHG emissions from agricultural soils are included in those related to forestry and land use changes (IPCC, 2007).This complexity is due to the high spatial variability (La Scala et al., 2000;Panosso et al., 2009;Brito et al., 2010;Teixeira et al., 2011a,b) and temporal variability (Herbst et al., 2010;Teixeira et al., 2011a,b) in soil CO 2 emissions (FCO 2 ).Spatial variations in FCO 2 are primarily the result of physical, chemical and biological soil properties such as organic matter content (Søe & Buchmann, 2005), carbon stock (Panosso et al., 2011), cation exchange capacity (La Scala et al., 2000 ), microbial biomass (Søe & Buchmann, 2005) and soil mineralogy (La Scala et al., 2000).
Geostatistical analysis has been used to describe the spatial variability of FCO 2 , using techniques such as ordinary kriging (OK) and sequential Gaussian simulation (SGS) for estimations in unsampled regions of the study area.Although these methods both produce interpolated values, they are considered distinct and have different goals and results.The OK method aims to provide the best and, therefore, the only estimation of variables of an unsampled location, whereas the goal of SGS is to provide values that reproduce the characteristics and behavior of the source data (Deutsch & Journel, 1998).Another important feature of SGS is its ability to assess the uncertainties associated with predictions by taking into account equiprobable multiple stochastic realizations.However, in OK, the variance calculations consider only the location of the data points and do not provide any measure of the spatial uncertainty of these estimates, which are obtained by SGS.Although OK is the most widely used method for the interpolation of FCO 2 values (La Scala et al., 2000;Ohashi & Gyokusen, 2007;Panosso et al., 2009;Brito et al., 2010), simulation has been preferred in several recent studies (Herbst et al., 2010, Teixeira et al., 2011a,b), providing an alternative to the smoothed values estimated by OK.
Because of the importance of sugarcane cultivation in the State of São Paulo and because of the uncertainty in the process of CO 2 emission and sequestration from agricultural soils, it is crucial to characterize the primary factors driving FCO 2 .This study aimed (i) to characterize the variability and spatial distribution of FCO 2 as well as physical and chemical properties, such as air-filled pore space, soil temperature, organic matter content and bulk density (BD); (ii) to evaluate the accuracy of the results of OK and SGS; and (iii) to determine the uncertainty in predicting the spatial variability of the variables using SGS.

MATERIAL AND METHODS
The study was conducted on the Fazenda Santa Olga (21° 21' S, 48° 11' W), which belongs to the São Martinho Mill, in Guariba, São Paulo State.According to the Thornthwaite classification, the local climate can be defined as B1rB'4a' -type mesothermal humid, with little water stress, evapotranspiration and less than 48 % annual evapotranspiration in the summer.The soil of the area was classified as high clay Oxisol (Eutrustox, USDA Soil Taxonomy).
Sugarcane , in a management system with mechanical harvesting, had been grown for eight years on the area.At the time of the study, the soil was devoid of vegetation and covered with large amounts of crop residue (12 t ha -1 ).On July 13, 2010, a regular grid of 60 × 60 m was installed with 141 points, between 0.5 to 10 m away from the installation site of PVC rings, used to assess soil CO 2 emissions (La Scala et al., 2000) (Figure 1).
The FCO 2 was assessed using three portable systems (LI-COR 8100).The soil chambers were coupled to a system that quantifies the internal concentration of CO 2 through optical absorption spectroscopy in the infrared spectral region.Before starting the experiment, the machines were tested and calibrated.The evaluations were conducted in the morning (8:00-9:30 h a.m.) over seven days, on Julian days 195, 196, 197, 200, 201, 204 and207 in 2010 (Teixeira et al., 2011b).Concurrently with the assessment of FCO 2 values, soil temperature was measured with a sensor coupled to the analysis system (LI-8100) and moisture (% volume) was measured with a TDR (Time Domain Reflectometer) -Hydrosense system, both in the 0-0.10 m layer.After the evaluations, soil was sampled from the same layers by the volumetric ring method to obtain soil BD and total pore volume (TPV) (Embrapa, 1997).The airfilled pore space (AFPS) was obtained by the difference between the TVP and soil moisture.Disturbed samples were also taken from the same sites to determine soil organic matter, using methods described by Raij et al. (1987).In the analysis, FCO 2 , temperature and AFPS were the average values of the seven days of evaluation.
Descriptive statistics (mean ± standard error, standard deviation, coefficient of variation, minimum, maximum, skewness and kurtosis) were previously used in the description of the variables to provide interpretations of geostatistical analysis.The spatial variability of the variables was determined using experimental variogram modeling based on the theory of regionalized variables, which is estimated by: where ) ( ˆh γ is the experimental semivariance for a separation distance h, z(x i ) is the property value at point i, and N(h) is the number of pairs of points separated by distance h.The variogram describes the spatial continuity of the variables as a function of the distance between two locations.In this study, spherical, exponential and Gaussian models were used.
The best fit to the variogram model used in the OK method was based on the cross validation, external validation, and the coefficient of determination (R 2 ) obtained for the model fit.For the variogram used in the simulation process, only the model's coefficient of determination was considered, as it is a stochastic method and validations are available only after completing all stages of the process.For external validation, 14 randomly selected points (10 % of the original data) (Teixeira et al., 2011a) were removed from the sampling grid (Figure 1) before creating the variogram model.The values observed at those points are then compared with the values estimated by OK and simulated by SGS at those points.In two validations (cross and external), observed and estimated values were used to calculate the root mean square error (RMSE).
where n is the number of values used in the validation, z(x i ) is the property value at point i, and ( ) is the estimated value of the property at point i.Smaller RMSE values indicate higher accuracy in the estimates.
Ordinary kriging is a weighted average of the neighboring samples, and the weights for each neighbor are determined through structural correlations in the data, represented by the semivariance as a function of h (Equation 1) and resulting in an estimate of minimum variance.The KB2D routine of the GSLIB -Geostatistical Software Library (Deutsch & Journel, 1998) was used to calculate OK estimates.
Stochastic simulations that reproduce the distribution reference points (data) are prioritized in relation to an optimal prediction with minimum variance estimation (OK).Several simulation algorithms are available.In this study, we used a SGS.In SGS, equiprobable maps of the distribution of variables are produced using the standard model of the variogram.The differences between the various outputs provide a measure of uncertainty in the prediction of spatial data.The stages of the SGS can be studied in more detail in Deutsch & Journel (1998).
In this paper, 300 realizations of each variable were considered.We randomly selected the 30 th , 68 th , 176 th and 214 th realization of each variable to represent 300 individual simulations.We chose random realization to maintain the characteristics of the method, as the selection of outputs based on a specific parameter can limit the amount of uncertainty provided by the method.The SGS procedure was based on the SGSIM routine in the Geostatistical Software Library (Deutsch & Journel, 1998).
Maps of minimum, medium (E-type), maximum and standard deviation were produced by counting the simulated points at each location in the 300 realizations.Pearson's analyses were conducted in the E-type estimates to compare the spatial patterns of the variables.
The accuracy and goodness-of-fit (G statistics) of the probability density function provided by the different interpolation methods were evaluated based on the G statistics, as proposed by several authors (Goovaerts, 2001;Herbst et al., 2010;Bourennane et al., 2010).The simulated/estimated values and the original data located at intervals of probability p, defined by the limits (1-p)/2 and (1+p)/2, were compared using and z(u j ) with j = 1, ... N, the fraction of true values falling into a series of symmetrical probability intervals p is obtained by the equation: is given by: Thus, to evaluate the proximity between simulated/ estimated fractions and the data set, we calculated the G statistic as shown below: where a(p): For inaccurate cases in which , the values of a(p) are twice as accurate as for cases in which . The best reproduction of the cdf is represented by a G value closer to 1.
According to Goovaerts (2001), the accuracy in the reproduction of the variogram can be evaluated by the following equation: where S is the number of lags used to construct the variogram, and To assess the quality of E-type estimates for each variable, the generated maps were subjected to external validation (Bourennane et al., 2010) based on the root mean square error (RMSE) (Equation 2) and mean error (ME) (Teixeira et al., 2011a) data by the following equations: where n is the number of values used in the validation (n=14), z(x i ) is the property value at point i, ( ) ˆi x z is the E-type estimates at point i.For unbiased estimates, ME should be close to zero.R. Bras.Ci.Solo, 36:1466-1475 The ME and RMSE values provide, respectively, measures of bias and accuracy in the estimates generated, however due to the existence of different scales between variables, a comparison of these characteristics between them becomes inappropriate.This problem was solved by standardization of the indices using the standard deviations of each variable (Table 1) (Hengl, 2007).

RESULTS AND DISCUSSION
The average soil CO 2 emission (1.57± 0.07 μmol m -2 s -1 ) in the evaluation period was lower than reported in other studies in Oxisols for sugarcane areas (Panosso et al., 2011), a difference that may be related to a lack of rainfall in the days before the experiment; high soil compaction, as indicated by mean values of BD of 1.50 ± 0.01 g cm -3 ; and low organic matter content (4.75 ± 0.05 g dm -3 ) (Table 1).Brito et al. ( 2009) evaluated soil respiration associated with sugarcane in a green harvest management system at different topographic positions and found average emission values from 2.39 to 2.97 μmol m -2 s -1 .High variability in emissions, characterized by a high CV value (50.02 %), has been reported in several studies, obviating the need for spatial characterization of this property using geostatistics (Teixeira et al., 2011a).The homogeneity observed in the soil temperature (CV = 1.69 %) is most likely related to the soil being covered with crop residues, which is characteristic of green management (Panosso et al., 2009).
The emission was positively linearly correlated with AFPS (r = 0.27, p < 0.01) and organic matter content (r = 0.25, p < 0.05) and negatively correlated with BD (r = -0.41,p < 0.01).The AFPS and BD directly impact the process of gas transport in soil, affecting both the entrance of oxygen, which is necessary for aerobic microbial activity, and the output of CO 2 , which is a byproduct of microbial activity.The OM, in turn, is the primary source of energy used by the microbes, thus explaining its positive relationship with soil respiration.Panosso et al. (2011) used multiple regression analysis to model the FCO 2 in sugarcane areas under green and slash-and-burn management, and they observed that the AFPS was selected for both models, being responsible for 18 % of the variability of FCO 2 under sugarcane cultivation.Linn & Doran (1984) studied the effect of water-filled pores on soil respiration and found the highest rates of emission under conditions close to 60 % filling of the pores, at a density of 1.40 g cm -3 , conditions very different from those obtained in our study, in which, on average, only 37 % of the pores were filled with water.Ohashi & Gyokusen (2007) studied the spatial and temporal variability of soil respiration in forest areas (Cryptomeria japonica D. Don) and identified soil compaction as a factor related to emission.Although the present study did not find a correlation between emissions and soil temperature, the temperature can directly influence the production process by affecting the dynamics of soil microorganisms.The lack of correlation between temperature and respiration can be attributed to the low variability in temperature during the evaluations (Table 1).
The structure of spatial variability was characterized by adjusting the variogram models (Table 2).For most geostatistical techniques, such as OK, a normal data distribution is not required; however, normality is desired to allow the inference of statistics such as maximum likelihood (Deutsch & Journel, 1998).Thus, a logarithmic transformation was used for FCO 2 to correct the asymmetric values shown in Table 1.This procedure is often used to describe spatial FCO 2 (Panosso et al., 2009;Herbst et al., 2010;Teixeira et al., 2011b).After the FCO 2 transformation, the normality of distribution was confirmed by the Kolmogorov-Smirnov test (p > 0.15).However, for predictions based on SGS, transformation should be performed to ensure the normality of the data (Table 2).After the spatial estimates are produced, in both cases, the estimated values are retransformed to the original data scale again.
With the exception of temperature and BD, the same models were obtained for the unprocessed and processed data.For FCO 2 , Gaussian models were the best fit, although in most other studies, spherical models (Brito et al., 2010;Herbst et al., 2010;Teixeira et al., 2011a) and exponential models (Panosso et al., 2009) fit better.Each model describes variability in different ways, and, along with the parameters, is responsible for the spatial patterns of each variable.The spherical model is appropriate for abrupt changes in variables with large distances, and the exponential model describes relatively irregular phenomena, whereas the Gaussian model is adopted for regular and continuous phenomena.Thus, the fit of the Gaussian model to the experimental variogram of FCO 2 demonstrates that, despite the high variability (Table 1), spatial distribution is smoothly distributed in space.This pattern could be due to the arrangement of points in the sampling grid used in this experiment and to the fact that the soil was covered by crop residues.The extremely dense sampling grid enabled the evaluation of small-scale variables (0.50 m), allowing the identification of similarity over short distances.
The degree of spatial dependence was classified as moderate for all variables, characterized by the relation 0.25 < C 0 /C 0 +C 1 < 0.75 (Cambardella et al., 1994).By comparing the models fitted to variograms used in SGS and OK, the similarity of the spatial dependence values and ranges demonstrated that the spatial structure remained even after normalization.These results were similar to those found by Delbari et al. (2009) in their evaluation of the spatial uncertainty of soil water content.Panosso et al. (2009) studied the FCO 2 in soils under sugarcane with green harvest management and found moderate spatial dependence structures.Herbst et al. (2010), evaluating the FCO 2 in soil without vegetation, found dependence structures ranging from weak to strong.The range obtained for FCO 2 (27.02 m) was similar to that found by Brito et al. ( 2010) for backslope and foot-slope topographic positions for the the same soil type and vegetation.Panosso et al. (2009), using a sampling grid with 60 points spaced 13.30 m apart (190 × 50 m), observed values ranging from complete independence between the samples to near 73.2 m.

Brito et al. (2010) and
The correlation of FCO 2 with the BD was closer (r = -0.41,p < 0.01), but the high similarity between the ranges in the variogram models fitted to FCO 2 (27.02 and 25.39 m) and AFPS (30.93 and 26.50 m) suggests a higher spatial correlation between them, which was possibly the variable with the strongest influence on FCO 2 in this study.The BD and OM had lower (11.28 and 12.74 m) and higher (49.72 and 46.59 m) range values, respectively, indicating a lower spatial continuity in the BD around OM.
Each realization represents a realistic image of the spatial distribution of the variable without the smoothing effect created by OK (Delbari et al., 2009).The differences between the realizations provide alternative visual measures of the spatial uncertainty associated with estimates (Deutsch & Journel, 1998).The standard deviation maps represent the point variation in all simulated realizations (n=300); in other words, they represent the local uncertainty associated with predictions based on observed values and the sampling grid (Grunwald et al., 2007).Thus, the deviations become zero at the points of observed values, due to the conditional characteristic of the simulation (Figure 2).There is a predominance of high FCO 2 values (> 3.30 μmol m -2 s -1 ), in the upper right and center of the map, whereas low values (> 1.80 μmol m -2 s -1 ), are distributed throughout the area, but with greater consistency in the upper portion of the map.
The various realizations randomly selected for the spatial representation of AFPS demonstrate some specific patterns, such as the presence of intermediate values, 32.95 to 40.45 %, in the upper right and center of the maps.For the soil temperature, consistently low values (< 16.75 °C) are observed in the upper left, whereas intermediate values (19.5 to 20.0 °C) are observed in the central portion of the images.The simulated values above 20.25 °C were associated with a large uncertainty throughout the entire area.Higher values of organic matter with high variability predominate in the central part of the map.Although areas with high BD values are evident in some realizations, their predictions are highly uncertain.For the soil temperature, the highest deviations were associated with lower predicted values.In contrast, for AFPS and BD, the greatest deviations were observed in regions with lower-density samples, as similarly found by Delbari et al. (2009).For OM, high deviations are distributed across the entire experimental area.
The reproduction of the cumulative probability density functions (cdfs) and variograms of the interpolated data were used to evaluate the accuracy of the results provided by OK and SGS (Table 3).For all variables, the simulations better the individual cdfs and variograms reproduced when compared to OK, with values of G statistics and e(g) closer to 1 and 0, respectively.Again, the statistics displayed a high degree of similarity between the E-type estimates and those generated by OK (Table 3 and Figure 2).The low performance of OK in the reproduction of the variogram and cdfs shows that it should not be used when a reproduction of the statistics and variability structure of the data is desired.
Ergodic fluctuations, which are small differences between the observed values and those predicted by SGS, are produced at each step of the simulation procedure (Goovaerts, 2001) and play a key role in increasing uncertainty and making the estimate more conservative.The exact reproduction of these functions is only advisable when measuring the accuracy and reliability of the samples (Deutsch & Journel, 1998).
Another factor that may influence the low reproduction of the cdfs and the variogram by the SGS, originates in the process of restructuring the variogram model due to normalization and the subsequent use of OK in the estimation of local averages, a procedure used to circumvent the lack of stationarity in the observed data (Deutsch & Journel, 1998).Bourennane et al. (2010) claim that all features of the simulation, including the use of simple or ordinary kriging, the maximum number of nodes simulated, the search radius and the choice of upper and lower values of interpolation, significantly affect the quality of the results.
According to Grunwald et al. (2007), the average represents the dominant property signal in the area, whereas the range of possible results is characterized by the minimum and maximum values, representing the best and the worst possible scenario for each variable.The FCO 2 showed a mean variation in an area from 0.36 to 3.85 μmol m -2 s -1 , resulting in a wide range of emission (from 0.18 to 1.85 t CO 2 ha -1 ), over a short time, in different scenarios.This wide range, although within the limits of observed values (Table 1), is due to the presence of areas with lowerdensity samples that resulted in extremely erratic estimates.
Considering the emissions for sugarcane cultivation reported by De Figueiredo & La Scala (2011) and the different estimates generated in this work, FCO 2 is the primary source of emissions in areas with green sugarcane management and is even higher than emissions from N fertilizers and diesel, indicating a significant shift in the composition of greenhouse gas emissions.The map of the minimum FCO2 values shows significant homogeneity in the area, whereas that generated with the highest values is similar to the E-type estimate and the standard deviation map (Figure 2) shows lower values at the top and bottom left and center-right of the map.
The uncertainty associated with estimates of soil properties, combined with the construction of different scenarios, provides useful information on the possible patterns of soil respiration, a property that is extremely erratic and difficult to predict because many properties have a direct or indirect relationship with FCO 2 , with very close interval values.
By the E-type estimates, spatial relationships could be established between FCO 2 and other properties (Figure 2).As indicated by the variogram (Table 2), the AFPS had the highest spatial correlation (r = 0.26, p < 0.01) with FCO 2 .Similarly to the AFPS, although less intense, the correlations of OM and soil temperature were also positive (0.13 and 0.04, respectively, p < 0.01).The BD had a correlation of -0.12 (p < 0.01).Although the soil temperature was correlated with FCO 2 when spatial patterns were considered, the spatial correlation for the other variables was lower, indicating the difficulty of predicting the spatial FCO 2 based on factors that can influence it.
The external validation procedure is often used to compare the quality and application of different interpolation methods (Teixeira et al., 2011a,b); however, in this case, the performance of E-type estimates generated for each variable was evaluated (Bourennane et al., 2010) (Table 4).According to the MEs values, the estimates for the AFPS (-0.07) had the least bias among the variables, slightly underestimating the validation points.The FCO 2 was the variable with the greatest data underestimation, with a MEs value of -0.48.For soil temperature and BD, the models used in the simulation process overestimated the validation set.
OK (6) E-type SGS (7)    According to Hengl (2007), RMSEs values > 0.71 indicate that the models captured less than 50 % of the variability in the points used for validation.Although all RMSEs values were considered high, soil temperature was the variable with the most accurate estimates, with a value of 0.85.The soil temperature estimates were 7.06 (OM) to 34.12 % (FCO 2 ) more accurate than the other variables.

CONCLUSIONS
1.The strong relationships observed between the sampling points and spatial estimates of CO 2 emissions (FCO 2 ), bulk density and air-filled pore space suggest the utility of these as covariates in interpolation procedures of FCO 2 to improve the accuracy of estimates of unsampled locations.
2. The configuration of the sampling grid and the soil covered with crop residues may have influenced the characterization of the spatial distribution of FCO 2 , temperature and OM, leading to a more homogeneous distribution of these properties on a small scale (less than 3 m).
3. The individual simulations maintained the characteristics of the original data better than ordinary kriging, reproducing the original variograms and histograms.
4. By sequential Gaussian simulation, areas with higher uncertainties were identified for the estimated properties, which is particularly important for sample designs that reduce uncertainty and thus increase the spatial accuracy of the estimates.

Figure 1 .
Figure 1.Sampling grid with 141 points to assess FCO 2 , temperature, soil moisture and physicochemical soil properties.(•) Points used in the procedure for external validation.
in distance h s calculated from the estimated/simulated values.Values close to 0 indicate good accuracy in the reproduction of the variogram.

Figure 2 .
Figure 2. Spatial pattern and deviations maps based on sequential Gaussian simulation (SGS) and ordinary kriging (OK).

Table 1 . Descriptive statistics of soil CO 2 emission and influential properties
n=141;