ESTIMATING PRECISION OF SYSTEMATIC SAMPLING IN FOREST INVENTORIES

The sampling technique commonly used in forest inventories is the systematic sampling. This study aimed to evaluate the estimator of the variance of the mean proposed by Cochran for a systematic sampling technique in forests with high and low percentages of the sampled area. The study areas comprised native vegetation in Minas Gerais. To assess the efficiency of the estimators in situations involving high sampling rates (determined as the percentage of the area sampled), a fragment where a census was conducted was used. The remaining fragments comprised situations involving low sampling rates, and for these fragments, inventory accuracy was determined using the Cochran estimator. As a result it was observed, in the fragment where the census was conducted, that the structure of the correlation coefficient proposed by Cochran remained approximately constant for the area, and to the extent that sampling rate reduced, the impact of the Cochran estimator on the inventory accuracy decreased. For the fragments with a low sampling rate, it could be inferred that the sampling rate was a key factor for the correlation proposed by Cochran to have an impact on the forest inventory accuracy. The use of this estimator is indicated for fragments with a sampling rate greater than 10% of the area.


INTRODUCTION
The growing demand for products derived from native or planted forest resources requires accurate assessment of forest growth dynamics through forest inventory (Ubialli et al., 2009). For a precise assessment of forest growth dynamics, it is necessary to apply sampling techniques that can faithfully capture the forest reality (Reis et al., 2007;Assis et al., 2009;Druszcz et al., 2012;Guedes et al., 2012).
Forest inventory is an activity that uses sampling techniques to obtain information on quantitative and qualitative characteristics of existing forest resources in a predetermined area (Mello et al., 2009;Vibrans, 2010). These sampling techniques can estimate population parameters based on the characteristics that are being evaluated. The most common sampling techniques used in forest inventories are: simple random, stratified random, and systematic sampling Scolforo, 2000). Estimators based on randomization were developed by means of probabilities generated by the randomization of the sampling units (Brus;Gruijter, 1997).
The sampling technique commonly used in forest inventories is systematic sampling. The most of ecological studies use techniques that incorporate systematization principles. In forests, randomization is restricted to achieve a better spatial coverage of the plots. Cochran (1965) designated this type of plot distribution in planted forests as "mismatched systematic sampling". This type of distribution does not "negatively affect" the principle of statistical randomness. However, this technique produces good spatial coverage of the area and therefore can generate accurate and reliable estimations (Soares et al., 2009).
The systematic sampling techniques do not possess their own estimators of the desired population parameters. The simple random sampling (SRS) estimators are frequently used, as reported by authors associated with the sampling field such as Scolforo and Mello (2006). The use of the estimator of the variance of the mean of the SRS for systematic sampling increases statistical problems, because systematic sampling can better explain the dependencies between the sampling units (spatial correlation). On average, there is no bias between two given sampling units. However, in terms of accuracy, there may be substantial bias according to the spatial correlation between the sampling units.
Such correlations are directly associated with the proportionality between the installed sampling units and the statistical population in question; in that, for larger distances between the sampling units, it is expected that the correlation between them tends to be null, thereby causing minimal impact on the inventory accuracy. Conversely, shorter distances between the sampling units result in correlation values that may cause bias in the inventory accuracy if these are not taken into consideration Mello, 2006). Cochran (1965) proposed an estimator to evaluate the effects of the correlation between the sampling units on the estimation of the variance of the mean using the systematic sampling technique.
Therefore, the aim of this study was to evaluate the estimator of the variance of the mean proposed by Cochran (1965) for the systematic sampling technique in the following two distinct situations: forests with high and low percentages of the sampled area.

Characterization of the areas and data collection
T h e s t u d y a r e a s c o m p r i s e d v a r i o u s phytophysiognomies of native arboreous vegetation in the state of Minas Gerais (Table 1). These areas were surveyed during the Project -"Forest Inventory of Minas Gerais" according to the methodology described in Scolforo, Mello and Oliveira (2008a). The selected fragments were intended to faithfully represent the plant biodiversity found in the state of Minas Gerais (Figure 1), thus providing a demonstration of the impact of adopting the correlation coefficient proposed by Cochran (1965).
The cerrado fragment located in Lavras (ID 126), with an area of 3.89 hectares, was completely inventoried, enabling the use of the high sample rate (determined as the percentage of the area sampled) in the performed simulations.
For the present study, data were collected from plots of 400 m 2 (for fragments with IDs 126 and 152), and 1000 m 2 (for the remaining fragments). In all evaluated fragments, the plots were standardized, with the distance between them being defined based on the total area and the number of the sampling units to be evaluated in each situation.
In each plot, the total height was measured with the aid of a height pole, in addition to the circumference at 1.30 meter above ground or at breast height (CBH) of all specimens with a minimum circumference of 15.7 cm.
The volume for each sample unit was determined from the sum of the individual volumes, which in turn were obtained from adjusted equations from the Forest Inventory of Minas Gerais project Acerbi Junior, 2008b;Rufini et al., 2010). Finally, the volume of each sample unit evaluated was converted to hectare.

Data Processing
The response variable evaluated in the present study was the volume per hectare for each fragment studied. Data processing consisted of calculating the mean, variance of the mean, and confidence interval using the estimator of SRS and the estimator proposed by Cochran (1965). These processes were performed in the two situations proposed in the study.

Fragment with a high sampling rate (Case 1)
To determine the efficiency of the estimators in a situation with a high sampling rate, we used the fragment with ID 126 where a census was conducted. In this situation, we evaluated the precision and the accuracy of the simulations performed, considering that the true mean of the fragment (μ) was known. Furthermore, it was possible to evaluate two high sampling rates (33% and 50%) in the total area, with the plots being systematically distributed throughout the study area.

Fragment with a low sampling rate (Case 2)
This is the most common situation in forest inventories. These surveys generally involve large areas and suffer from budget constraints for inventories. This implies large distances between the plots and results in a low degree of correlation between them.

Inventory processing
For each fragment, the forest inventory was processed according to the estimator of the mean, variance of the mean, and confidence interval (this interval only being generated for the fragment where the census was conducted to determine the sampling accuracy) of the SRS using the traditional estimators Mello 2006). The estimator of the mean is the same for the situations involving randomization or standardization of the plots. In the present study, there is a basic distinction between the variance of the SRS mean (Equation 1) and the estimator proposed by Cochran (1965) (Equation 2), according to the following equations: Substituting (3) Where: S 2 is the sampling variance; n is the sampling size; N is the number of possible plots present in each area; ρ is the correlation coefficient proposed by Cochran (1965) between pairs of sampling units.
The correlation coefficient proposed by Cochran (1965) between the sampling units (Equation 3) is represented in the following equation:  Where: n, N and S 2 were defined above; k is equal to N/n; y ij denotes the jth member of the ith systematic sample so that j = 1 to n, i = 1 to k; y iu denotes the jth member at an distance u from y ij ; y _ represents the sample mean.
After obtaining the variance of the mean, it was possible to determine the effect of adding the correlation coefficient proposed by Cochran to the inventory accuracy for both situations.
After obtaining the estimates for the completely inventoried fragment, it is possible to observe beyond the effect of precision, the accuracy provided by the performed inventory.

Effect of correlation coefficient proposed by Cochran -Case 1
In this first case, we evaluated the correlation coefficient in a study area of 3.89 ha, which revealed the parametrical statistical value of the forest inventory processing. The aim was to evaluate the effect of the distances between the evaluated plots.
In this fragment, it was possible to generate 5 systematic samples with sampling rates of 33% and 50% of the total area. The maximum amount of plots that fit in this area is 34 plots. Therefore, two databases with 17 plots representing 50% of the area were generated. For a sampling rate of 33%, 2 databases with 12 plots and a third database with 10 plots were generated. In all these situations, the plots were distributed systematically. The descriptive statistics for each sample present in the fragment are shown in table 2 and it is important to note the maximum distance separating the sampling units (D).
The mean parameter (μ) in Case 1 was 116.78 m 3 /ha. Considering the estimated means for each sampling rate, it was observed that the mean of sample 5 was the most accurate, because it was the closest to the parameter.
Notably, even with a reduction in the sampling rate, the structure of the correlation coefficient proposed by Cochran (1965) tends to remain approximately constant for the area. A study conducted by Sé et al. (2013) corroborates this fact, as it clearly explains how the structure of the correlation coefficient remains approximately constant for the same area with different sampling rates. This can be explained by the fact that the variations in sampling size (n) present in the denominator of the formula of the correlation of Cochran (1965) also impacts the estimated variance. Therefore, variations in the numerator (differences between each plot and the mean) result in a proportional change between the two components of this ratio (numerator and denominator), generating values for the correlation of Cochran that are very similar to each other for the same fragment. Therefore, different sampling rates cause a tendency of continuity for the correlation structure, for the population, considering systematic sampling, to cover the spatiality of the whole area.
As the correlation structure for the fragment remains constant, it could be verified that the sampling size (n) will determine how the addition of the correlation coefficient will impact the estimator of the variance of the mean, i.e., the greater the sampling rate, the larger the impact of this factor on the inventory error. Table 3 clearly emphasizes this point by demonstrating that as the sampling rate decreases, the impact of correlation on the inventory accuracy decreases [difference in the error estimations between SRS and Cochran (1965)]; however, this still had a good advantage over the use of the classical estimator for the evaluated fragment. It was verified that there was a reduction in the percentage error between the SRS and the Cochran estimator, ranging from 14% to 28% approximately. It was observed that even when 50% of the area was sampled with the SRS estimator, there was no error lower than 10%. Using the correlation coefficient for samples 1 and 2, the error was approximately 10%.
As the precision of the error estimation of the inventory for this fragment increases, the use of the estimator proposed by Cochran reflects a more accurate confidence interval (Table 4), because the range of variation of the interval was small. Notably, the true mean (116.78 m 3 /ha) was found between the generated intervals.   SRS: confidence interval using the estimator of simple random sampling; Cochran: confidence interval using the estimator coefficient proposed by Cochran (1965).

Effect of the correlation coefficient proposed by Cochran -Case 2
In this case, we used fragments of larger areas, where the sampling rate was low, and consequently, the distance between the plots (D) was higher.
The first interesting point to be noted is the values of the coefficient of variation (CV) that represent the spatial variability of the vegetation in the state of Minas Gerais. In this regard, because the presented CVs vary widely, starting from an intermediate range, e.g., for ID 145 that is a Submontane Seasonal Semideciduous Forest, to very high amplitudes, e.g., ID 152 that is a Montane Ombrophilous Forest, indicating dispersion from the average. Therefore, table 5 contains the information related to fragments, distance values between the plots and descriptive statistics, and also highlights the behavior of the correlation structure. One can clearly see that this correlation increasingly approaches zero as the fragment size increases.
Increasing the area leads to a decrease in the correlation value, because according to the formulation of the correlation proposed by Cochran (1965), N (number of plots that fit the area, i.e., as the area increases, the number of potential plots to be measured increases) represents a strong impact on the denominator of the correlation. This shows that as N increases, the correlation value (ρ) will approach zero.
This indicates that the impact on the correlation is dependent on the percentage of the area sampled (sample rate), i.e., both for large and small areas. The size of n (sampling size) will be essential for improving the estimation of the variance of the mean when using the estimator proposed by Cochran (1965). For large areas, where ρ will certainly approach zero, a high sampling rate will compensate this effect to make this correlation relevant for the inventory accuracy.
Despite the fact that the correlations presented in this study were negative, according to Cochran (1965), this correlation value may be presented as positive for certain areas. This indicates that the use of the SRS underestimated the inventory error (keeping in mind that a negative correlation indicates that SRS overestimates the error).
Thus, table 6 clearly shows that the effect of using the estimator proposed by Cochran worked better as the sample rate increased. The two smaller areas, ID 105 and ID 152, were the most highly sampled areas, representing 3% and 4%, respectively, of their total area. Yet, the effect of using the correlation was low.
The use of the estimator proposed by Cochran (1965) will be effective for sampling rates of least above 10%. Sé et al. (2013) observed that a sampling rate of 16% of the total area was sufficient for the correlation effect to have a good impact on the precision of the inventory, and this impact was approximately 7%.
Naturally, it can be inferred that the estimator proposed by Cochran (1965) is a more feasible alternative to be applied to small areas, considering the high probability that at least 10% of these areas will be sampled. Regarding very large areas, even with high sampling size, the sampling rate will be probably low on account of the total area. This indicates an almost null effect for the use of the estimator proposed by Cochran (1965).  SRS: inventory error using the estimator of simple random sampling; Cochran: inventory error using the estimator coefficient proposed by Cochran (1965).