Sampling Processes and Intensities for the Geostatistical Modeling of an Unevenly Aged Forest

The aim of this study was to estimate the basal area, through traditional sampling processes and geostatistical interpolation, and compare it to the forest census carried out in 25 ha of a Mixed Ombrophilous Forest remnant. The sampling simulations were structured for the systematic sampling, two stages, and simple random, in intensities 5, 11, and 22% of the potential sample units of 25 × 25 m. The increase in sample intensity for the systematic sampling and two stages revealed a higher level of spatial behavior details of the basal area, whereas when comparing the census with sampling, it was observed that the geostatistical interpolation shows ability to improve the accuracy of the basal area estimation.


INTRODUCTION
Forest inventory has quantitative and qualitative information about forests (Campos & Leite, 2013).Regarding data collection, forest inventories can be classified into the total enumeration or census, in which all the individuals of the population are observed and measured, obtaining true values, while only a part of the population is observed for the sampling and an estimation of its parameters is obtained, which conveys an error (Péllico & Brena, 1997).
The process for performing the census of a forest is considered a time-consuming and onerous task (Machado & Figueiredo, 2006), hence the vast majority of forest inventories are carried out by sampling (Péllico & Brena, 1997).Thus, the sampling process and its sample intensity become the basis for estimating the characteristics of a population (Scolforo & Mello, 2006), and whose inadequate choice may underestimate or overestimate the forest interest variable (Corte et al., 2013).Some researchers have evaluated and obtained some conclusions about different sampling processes and intensities in native forest inventories, such as Higuchi (1987) who observed that systematic sampling is more accurate than random sampling in commercial inventories in humid tropical rainforest on the mainland of Manaus; Ubialli et al. (2009) in an ecotonal forest in the northern region of Mato Grosso and Cavalcanti et al. (2009) in a native area located in the municipality of Sena Madureira in the Acre State, finding that the parcels/stands with the smallest error do not depend on the sampling process, while smaller errors were produced at higher sample intensities; and Corte et al. (2013) to express diversity in Araucaria forests in the municipality of Fernandes Pinheiro in Parana State, who found that the random process better represented the number of species in smaller sampled areas, whereas the systematic process better represented the number of species in larger sampled areas.
Significant evolution has been observed in the development of technologies that aid in forest inventories since the beginning of the development of the Geographic Information System, enabling to obtain and process the geographical position of parcels (Brandelero et al., 2008).Also, the introduction of geostatistical modeling has enabled spatial data analysis (Pereira et al., 2016) which presents spatial variability along with some degree of organization or continuity expressed by spatial dependence (Vieira, 2000).
With the detection of the spatial dependence of forest variables, geostatistics becomes an important tool for interpolating information in non-sampled sites (Akhavan et al., 2015).Thus, the construction of thematic maps can aid in forest planning by investigating the influence of different processes and sampling intensities on natural forests, mainly in the basal area, through which it is possible to express the phytosociological parameters and assign wood exploitation (Freitas & Magalhães, 2012;Souza & Soares, 2013).Despite sampling and geostatistics strategies being old techniques applied in forest inventories, there is a need for indicator surveys to compare the estimates of the two methods with respect to the parametric value of the forest.
Thus, considering the hypothesis that the geostatistical interpolator provides statistically similar estimates to the parameters obtained in a forest census executed in remnants of Mixed Ombrophilous Forest, the aim of this study was to estimate the basal area using traditional sampling processes and the geostatistical interpolator, comparing it to the actual values obtained from the forest census.

Study area
The study was conducted in a Mixed Ombrophilous Montana Forest remnant located in the Irati National Forest at an average altitude of 820 m, between the coordinates 25°01'S and 25°40'S and 51°11'W and 51°15'W.The climate of the region is of Cfb type (Köppen), with average temperatures of 17.5 °C and rainfall above 1,500 mm year -1 (Alvares et al., 2013), where the predominant soils are of Latosol and Cambisol orders.

Forest census
The parametric basal area of 30.6 m 2 ha -1 was obtained from performing the census in 25 ha of the forest, in which all individuals with a diameter of 1.3 m (DBH) equal or greater than 10 cm were identified, measured and geo-referenced.The forest

Processes and sampling intensities
The forest remnant area was segmented into a mesh composed of 400 units of 25 m × 25 m, used for composing the samples for simulating processes and sampling intensities.Thus, three intensities were defined for the simulations to standardize the number of units sampled.In the first, a number of 20 sample units was established expressing 5% (1.25 ha) of the reference area (25 ha) (Figure 1A, D and G); the second included the measurement of 44 sample units, representing 11% of the population (2.75 ha) (Figure 1B, E, H); while the third considered 88 units, corresponding to 22% of the potential units (5.5 ha) (Figure 1C, F, I).
The processes were simulated for systematic sampling, two-stage sampling, and simple random sampling.In the systematic sampling, the sample units were selected in the stages between the lines (k 1 ) and between units of the line (k 2 ): a sample intensity of 5% was defined as the range k 1 of 75 m and k 2 of 200 m (Figure 1A); an intensity of 11% presented a k 1 of 75 m and k 2 of 100 m (Figure 1B), while a k 1 of 75 m and k 2 of 50 m were used for the intensity of 22% (Figure 1C).The interval between lines (k 2 ) was greater than the interval between lines (k 1 )due to the rectangular shape of the fragment, The reference area was divided into 25 primary units (N) of one hectare each (100 m × 100 m) and subdivided into 16 secondary units (M) of 25 m × 25 m for the structural organization of the two-stage sampling.For the sampling intensity of 5%, five primary units (n) and later four secondary units (m) were randomized (Figure 1D).For the intensities of 11 and 22%, each primary unit consisted of 16 secondary units (M), of which four (Figure 1E) and eight units (m), respectively, were randomly selected from each selected primary unit (Figure 1F).
The method of selection without replacement was chosen for the simple random sampling, in which all possible combinations of potential sample units of the population had equal opportunity to participate in the sample, and 20, 44 and 88 sample units were selected for the intensities of 5% (Figure 1G), 11% (Figure 1H) and 22% (Figure 1I), respectively.
The Grubbs (1969)  5% level of significance for analysis of the outliers.After their analysis, the descriptive statistics were calculated.Additionally, the normality of the data was verified by the Kolmogorov-Smirnov test (KS) for the probability of 95%.Next, the estimators of the sampling processes were calculated according to Péllico & Brena (1997), with the accuracy evaluated by the absolute (Ea) and relative (Er) errors and confidence interval (CI) at a 95% probability level.

Geostatistical modeling
The geostatistical analysis was applied for the spatial dependence model through semivariance (Equation 1), considering the geographical position of the sample units in the field, the subsequent computation of distances (h), and the numerical differences of the variable (Z) in the mesh of points (Isaaks & Srivastava, 1989).The semivariances were determined in four directions in the spatial plane: 0°, 45°, 90°, and 135°, and matrix of the mean half-variances between the equivalent distances was obtained and the possible anisotropic phenomena were verified (Yamamoto & Landim, 2013).

( ) ( ) ( )
where: γ(h) = semivariance of the Z(x i ) variable; h = distance; and N(h) = number of pairs of Z(x i ) and Z(x i +h) points measured, separated by a distance of h.
The spherical, exponential and Gaussian models were used to describe the spatial dependence structure using the GS + program (Robertson, 2008).For their adjustment, the theoretical semivariogram structure was defined according to Yamamoto & Landim (2013) as C 0 = nugget effect (the value of the semivariance for the zero distance, which represents the random variation component); C = contribution (structural variance); C 0 + C = threshold (the value of the semivariance in which the curve stabilizes at a constant value); and A = range (distance from origin to the point at which the plateau reaches stable values).
The adjustment of the theoretical semivariograms was carried out by the Weighted Least Squares Method, with the evaluation and selection of the models performed based on the minimization of the weighted sum of squared deviations (WSSD), the highest coefficient of determination (R 2 ) and on cross-validation (R 2 ) (Isaaks & Srivastava, 1989), which considers the linear, angular and determination coefficients of the cross-validation (R 2 vc ) and the standard error estimate (S yx %).They were staggered as a way to standardize the comparison between semivariograms according to the recommendation of Vieira et al. (1997).
The classification of Cambardella et al. (1994) was used to analyze the degree of spatial dependence (DD), in which semivariograms with a nugget effect less or equal to 25% of the plateau were considered as a strong DD; between 25 and 75% DD as moderate; and greater than 75% DD as weak.Subsequently, the interpolation was performed by ordinary point kriging (Equation 2), which consists of a non-skewed linear estimator with minimal variance for interpolation in non-sampled positions (Isaaks & Srivastava, 1989), in which the weights ( i λ ) were determined by the Lagrange multiplier technique (Webster & Oliver, 2007).Also, the thematic maps were made using the GS + program (Robertson, 2008), considering the classes with absolute intervals of the basal area.

( ) ( )
where: ( ) The comparison of the mean basal area with the value observed in the census was made using the determination of the real error (E real %) as proposed by Augustynczik et al. (2013) to verify the consistency of the estimates through the simulated traditional samplings and the geostatistical interpolator (Equation 3).Thus, the aim of this study was to evaluate the quality of the geostatistical interpolator in estimating the basal area for native forests.
( ) where: VE = estimated value of the basal area by the sampling process and the geostatistical interpolator; and VR = real value of the basal area resulting from the census.
The computation of the areas for the different classes with absolute basal area intervals was first carried out to quantify the average basal area estimated by Kriging and then the central value of each class was multiplied by the corresponding size for further subdivision of the total experiment area (25ha).

Traditional estimators of sampling processes
No outliers for the basal area variable were observed according to the Grubbs test, leading to a non-rejection of the null hypothesis (Table 1).The asymmetry values that express the tail extension of the distribution, ranged from -0.138 to 0.664, corroborating with the normal distribution at the 5% probability level by the Kolmogorov-Smirnov test (Table 1).
When analyzing the amplitude of the basal area, its direct relation with the sample intensity was noticed, which increased as the sample intensities intensified.This behavior is possibly justified by the diversity found in multi-native forests (Lima & Leão, 2013), whose remainder in the present study showed 117 species belonging to 44 botanical families, equivalent to 567 trees.ha - , and not being sufficiently sampled at lower intensities.
For the estimation of the basal area, the lowest absolute (Ae) and relative (Re) errors for the intensity of 5% were found for the systematic sampling process, while the random sampling resulted in the best estimates for the intensities of 11 and 22% (Table 1), along with a tendency of smaller errors for greater intensities regardless of the sampling process tested, as observed in native forests studied by Cavalcanti et al. (2009) and Ubialli et al. (2009).

Geostatistical modeling
The presence of spatial dependence in all processes and sample intensities used to estimate the basal area was verified (Table 2).In general, a moderate to strong degree (of spatial dependence) was observed for the geostatistical adjustments, similar for each sample intensity regardless of the sampling process used, resulting in the possibility of generating maps regardless of the sample intensity used,corroborating with Kravchenko (2003).
The highest coefficients of determination (R 2 ) and the lowest weighted sum of squared deviations (WSSD) were recorded for the sample intensity of 11% (Table 2) for all processes, in which the systematic sampling resulted in the best geostatistical adjustments, with R 2 of 0.952 to 0.989 and WSSD of 0.001 to 0.007.On the other hand, the lowest R 2 and the highest WSSD values were observed for the two-stage and the random processes at the intensity of 5%.
Overall, the Gaussian model stood out as the best for the semivariogram adjustments; it obtained the lowest WSSD and the highest R 2 , except for the two-stage and the random model at the intensities of 5 and 22%, respectively, in which the Exponential model was chosen in the cross-validation (Table 3).
The adjustments selected for the spatial modeling of the basal area in different sample intensities and processes resulted in linear coefficients of 3.856 to 21.907; angular coefficients between 0.341 and 0.872; cross-validation coefficients (R 2 vc ) of 0.030 to 0.145;    and standard error estimates (S yx %) from 19.7 to 30.6% (Table 3).It was observed that the linear and angular coefficients were distant from the ideal theoretical values, in addition to acceptable low R 2 vc and S yx % considering the heterogeneity characteristics of the tropical forest structure.
With the semivariograms designated by cross-validation, a reduced dispersion of the observed values along the average line estimated for the systematic sampling was observed for all intensities, and for random sampling intensities of 11 and 22% (Figure 2).On the other hand, a greater dispersion was observed for all the intensities of the two-stage process and for the random sampling at the intensity of 5% of the sampled area.In addition, isotropic behavior was admitted due to the similarity of directional semivariograms in the spatial plane.
Estimates of the nugget effect (Co) increased among the sampling processes to estimate the basal area, while the contribution (C) decreased with the increase in sample intensity (Figure 2).Thus, there was a tendency to modify the spatial structure detected by the geostatistical analyzes.According to Mello et al. (2006), it is common to observe lower nugget effect values at higher intensities in the executed inventories with a low number of sample units.
Figure 3 shows that the increase in sample intensity provided greater detail of the kriging information in basal area classes, except for the random process.This detailing in the variation patterns of the maps becomes increasingly degraded with the increase in the distance between the sample units and the decrease in the sampling intensity (Souza et al., 2014).Different spatial patterns were observed among the intensities of the systematic sampling process, with a predominance of class IV of the basal area at intervals ranging from 26.0 to 32.2 m 2 ha -1 ; followed by class III at the intensity of 5%; and by class V at the intensities of 11 and 22% (Figure 3A up to 3C).For the two-stage sampling process (Figure 3D up to 3F), the 5% intensity resulted in three basal area classes and in greater homogeneity, with values between 26.0 and 44.6 m 2 ha -1 , as the intensities of 11 and 22% were represented by the lower classes.Again, a predominance of the class IV basal area was observed.
When comparing the estimates for the random process (Figure 3G up to 3I), the greatest variability was found for sample intensity of 11% with the definition of six classes of basal area, while five and four classes were defined for the intensities of 5 and 22%, respectively.The most extensive stratum was represented by class IV, followed by class III at the 5% intensity, and by class V at the intensities of 11 and 22%.

Comparison between traditional and the geostatistical interpolator estimators
The systematic process tended to present a lower mean basal area than in the random sampling, with errors of -0.5 and 0.3% for the inventory and the geostatistical interpolator, respectively, while the two-stage process resulted in overestimations of 1.1 and 2.8% in relation to the census (Table 4).In general, the estimates resulted in values with mean difference lower than 1%, except for the two-stage sampling at the intensity of 11%.Despite the quality of the estimates by the geostatistical interpolator, the major underestimates and overestimations of the basal area occurred in the random and two-stage processes at the 5% intensity, respectively.Scolforo et al. (2016) evaluated the best spatial technique for mapping the above-ground carbon stock of tree vegetation and identified that no interpolator is robust enough to capture the pattern for extreme values.The largest errors in the census were observed for sample the 5% intensity (Table 4), as reported by Augustynczik et al. (2013).
It was observed that all the sampling processes presented adequate estimates for the basal area, with an advantage for the geostatistical modeling in the systematic and the two-stage sampling processes for the intensity of 5%.The geostatistical interpolator resulted in a smaller error for the random process at 11% intensity, while the traditional sampling estimator presented the best estimates for the intensity of 22% in the two-stage and random processes.
In studies with the species Quercus suber in Spain, Montes et al. (2005) observed that when the basal area presented spatial continuity, random sampling has a greater standard error than systematic sampling.Contrary results have been observed by Lundgren et al. (2015) in Eucalyptus stands in the semiarid of Pernambuco, where random sampling showed the best results for estimating the individual eucalyptus volume performed by the geostatistical analysis.Thus, it is observed that spatial continuity is influenced by the different forest physiognomies, in addition to the various methods and sampling intensities.

CONCLUSION
Traditional forest inventory estimators and the geostatistical interpolator present adequate results when compared to those obtained by forest census, in which the average difference is less than 1%.Thus, incorporating the spatial component in forest inventory processing has the ability to improve the accuracy of basal area estimates of native forests, while the increase in sample intensity for systematic and two-stage sampling allow a greater level of detail of the basal area spatial behavior for composing thematic maps using geostatistical modeling.

Figure 1 .
Figure 1.Allocation of sample units for the sample intensities and processes in a Mixed Ombrophilous Forest remnant.
) = experimental data; and n = number of sample units.
SI = Sample intensity; C o = Nugget effect; C = Structural variance; A = range; DD = Degree of spatial dependence; R 2 = Coefficient of determination; and WSSD = weighted sum of squared deviations.

Figure 2 .
Figure 2. Staggered semivariograms of the basal area (m 2 ha -1 ) for the sample intensities and processes in a Mixed Ombrophilous Forest remnant.

Figure 3 .
Figure 3. Thematic maps of the basal area for the sample intensities and processes in a Mixed Ombrophilous Forest remnant.

Table 1 .
Estimates of basal area using systematic sampling (1), two-stage sampling (2) and random simple sampling (3) in a Mixed Ombrophilous Forest remnant.
SI = Sample Intensity; KS Test = Kolmogorov-Smirnov; G = Basal area; Ae = Absolute error; Re = Relative error; CI = Confidence interval; and ns = not significant (Grubbs Test = no outliers values in the data series; and KS Test = normal distribution).

Table 2 .
Parameters of the semivariograms adjusted for the basal area in the sample intensities and processes in a Mixed Ombrophilous Forest remnant.

Table 3 .
Cross-validation of the geostatistical adjustments selected for the sample intensities and processes in a Mixed Ombrophilous Forest remnant.

Table 4 .
Mean basal area observed and estimated by the sampling processes and the geostatistical interpolator in a Mixed Ombrophilous Forest remnant.
G = Mean basal area; SI = Sample intensity; and E real = Real error.