Use of the correlation coefficient between plots . . . USE OF THE CORRELATION COEFFICIENT BETWEEN PLOTS IN ORDER TO IMPROVE THE ACCURACY OF FOREST INVENTORIES

Forest inventories are usually compiled without taking into account the existing correlations between sampling units, which is debatable particularly where the calculations involve environmental variables. When the potential correlations between sampling units are overlooked, the accuracy of such inventories becomes distorted in terms of the confidence interval range for the variable of interest, which is volume in cubic meters. The magnitude and form of such distortion will vary according to the correlation intensity between sampling units. This study aimed to present an analysis of the addition of the correlation coefficient to the calculation of the variance of the mean in a systematic sampling procedure of a native forest population or area, as well as its impact on the accuracy of the resulting estimates, with the assumption of independence between sampling units and with the addition of a correlation between sampling units as suggested by Cochran. Results revealed that, where the correlation coefficient was added to the variance of the mean formula, it increased inventory accuracy by about 14.3%, leading to the conclusion that such an effect will occur in any forest inventory being compiled for any forest population or area of interest.


INTRODUCTION
Preparing a forest management plan with its relevant procedures is only possible by knowing or by estimating the parameters of the forest population in question.
A forest inventory can be defined as an activity that seeks to obtain quantitative and qualitative information on existing forest resources in a preestablished area (population), therefore a forest inventory consists in partially measuring a population, that is, measuring sampling units or plots to then subsequently generate estimates for the total area (LEITE; ANDRADE, 2002).More specifically, it estimates the biophysical characteristics of a given forest from direct measurement of individual trees in sampling plots that are representative of the tree population constituting such forest (RODRIGUEZ et al., 2010).Its purpose is to apply and evaluate sampling systems capable of generating accurate estimates of the population being sampled.
Much of the costs incurred in the forestry sector are concentrated in obtaining information necessary to carry out planning activities.Countless forestry-related studies were conducted looking to optimize the costaccuracy relationship, including: Druszcz et al. (2010), Nakajima et al. (1998), Soares et al. (2004) andVasquez (1988).It is thus extremely critical to obtain this necessary information not only at the lowest possible cost but also with the highest possible accuracy (SOARES et al., 2004).

Sé, D. C. et al.
This fact justifies seeking more specific methodologies, in terms of sampling, for the various forestry sectors.The idea of improving accuracy by using sampling procedures that have new estimators added without incurring extra costs while being easy to apply is thus very attractive.
Several sampling procedures are available that could be used in forestry to address a population of interest, and the most commonly used are Simple Random Sampling, Stratified Random Sampling, Systematic Sampling, Cluster Sampling, among others (PÉLLICO-NETO; BRENA, 1997).It should be noted, however, particularly where native forests are concerned, that systematic sampling is usually the procedure obtaining best estimates for the parameters of interest, as it ensures better representativeness of the sampled area (AUBRY; DEBOUZIE, 2001) due to the systematic method of plot distribution across the field, consequently capturing best the variation in the area.
One drawback of using systematic sampling is that it does not allow deducing an estimator for the variance of the mean from data of a single sample.This is due to the fact that the selection of sampling units is not independent since only the first unit is random.Several methods have been proposed to best determine the approximation of the sampling error of a systematic sample (SOUZA, 2007).In populations with heterogeneity between the sampling units, or with a defined tendency, an alternative for estimating the variance and sampling error is to use the successive difference formula based on the premise that the sampling units are not completely independent (CAMPO; LEITE, 2006).Cochran (1977) argues that systematic sampling is accurate when units within the same sample are heterogeneous and inaccurate when units are homogeneous, which is obviously intuitive, because if there is little variation within a systematic sample, successive sampling units will be repeatedly providing the same information.
Due to the great diversity present in native forests, even when they are stratified into sub-populations, obtaining accurate estimates can be difficult.With that in mind, a possible proposal for solving this problem is to add a variable to the estimators used in the systematic sampling procedure that will enable capturing the variation between the launched plots, in word words, besides the sample variance already existing in that estimator, to add a variable capable of better explaining the great diversity present.This variable is the correlation coefficient between plots as proposed by Cochran (1977) and used by Mello (2004).
The correlation coefficient follows classical statistics theory as it only considers the relationship between trial units rather than taking into account their spatial location, and thus acting as a measure of population homogeneity (COCHRAN, 1977).Overlooking these potential correlations between sampling units can distort estimates made for population variability.This means to say that if the correlation between sampling units is ignored, the resulting confidence intervals are overestimated or underestimated depending on the intensity of the correlation being disregarded (MINGOTI;FIDELIS, 2001).
The central idea is that the estimator used to obtain the variance of the mean in systematic sampling procedures of native forests fails to effectively provide the variance around the mean within a population, leading to confidence interval distortions.Even being the procedure that provides best spatial representativeness of the population, there is loss of information which is strongly captured when systematic sampling is used, for instance the relationship between sampling units.Therefore, the estimator proposed by Cochran (1977) can help capture better such relationships by adding the correlation coefficient to the estimator of the variance of the mean.
This study aimed to test the effectiveness of the correlation coefficient in improving accuracy of a forest inventory using the systematic sampling procedure, and also to compare performances of the traditionally used variance of the mean estimator and the estimator proposed by Cochran (1977).

Study site
The study was conducted in a montane seasonal semideciduous forest 5.04 hectares in area and located in the municipality of Lavras (MG) at coordinates 21°13'40''S and 44°57'50''W, at an altitude of 925 meters (Figure 1).The local climate is Cwb type, according to Koppen classification, which means temperate with mild summers and dry winters.The local soil is predominantly a distrophic red latosol with a very clayey texture (CURI et al., 1990).

Data collection
Dendrometric data were collected from every individual in 126 contiguous plots 20 x 20 meters in size.In each plot, all individuals were marked with metal tags containing plot number and tree number.All individuals were measured to obtain merchantable height and circumference 1.30 meter above the ground (CBH).Use of the correlation coefficient between plots ... Individual tree volume was estimated using an equation selected by Scolforo et al. (1994), and individual volumes were subsequently added together to find the aggregate plot volume.

Data processing
Data were processed to generate and assess the percent error of the inventory and the confidence interval, in two situations: (1) calculation based on the variance of the mean by the estimator of simple random sampling, (2) adding the correlation coefficient between sampling units to the estimator of simple random sampling.
Because all individuals in the area were enumerated (census), the parameters became known.That enabled simulations of eleven systematic samplings in the area, in which what changes is k (sampling interval) as a function of the allowable error (E%).Calculations derived two possible samples with an allowable error of 7.5%, k=2 plots and n=63 plots; three possible samples with an allowable error of 10.6%, with k=3 plots and n=42 plots; and six possible samples with an allowable error of 17%, with k=6 plots and n=21 plots.Allowable errors were selected for practical reasons.An allowable error of 7.5% is generally used in commercial forest inventories, while errors of 10.6% and 17% were accepted so as to obtain a larger number of simulations of possible systematic samples in the area.
Data were organized in such a way as to produce eleven different databases of the same trial site, that is, eleven possible systematic samples for the area.The eleven databases are divided into three major groups: Group A, whose sampling interval was 2 plots, thus providing two databases; Group B, whose sampling interval was 3 plots, thus providing three databases; and Group C, whose sampling interval was 6 plots, thus providing 6 databases.
Based on the eleven databases, part one of inventory processing was performed using estimators of simple random sampling, with assumption of independence between samples.These estimators are cited in several books directed at forest inventory sampling, including Cochran (1977), Scolforo and Mello (2006) and Thompson (1992).Next is the usual estimator of the variance of the mean (1).
Part two of inventory processing was performed similarly to the usual procedure (part one), the difference lies in the estimate of the variance of the mean, to which the correlation coefficient is added according to formula (2) and as proposed by Cochran (1977) and used by Mello (2004).
( ) where N: number of sampling units applicable to the area, n: sampling intensity used in the area, S 2 : sample variance of data obtained in the survey, and ρ w : correlation coefficient between paired units from the same systematic sample, as defined by formula (3): where y ij : member of order j of systematic sample of order i, so that j=1,2,...,n, i=1,2,...,k, Y -: sample mean of individuals, n: sampling intensity in the area, N: number of plots applicable to the area, and S 2 : sample variance of the data.
All analyses, charts and calculation routine of the correlation coefficient were performed using software R Development Core Team (2010).Requests for use should be submitted to the author.

RESULTS AND DISCUSSION
Table 1 provides the main descriptive statistics of forest inventory processing, plus the correlation coefficient between sampling units.This information should be submitted to exploratory analysis.Data refer to the three groups of systematized plots launched in the area.The value of the estimated mean was found to be similar among  all three groups, regardless of the sampling intensity.This is due to the estimator of the sample mean not being biased, according to the statistical properties of this estimator (MAGINA et al., 2010).
All sampling procedures used in forest inventories are grounded in the assumption of independence between sampling units, which is debatable particularly where calculations involve environmental data.By overlooking the potential correlations between sampling units, one could be distorting the estimates of variability for a given population (MINGOTI;FIDELIS, 2001).
The correlation results were found to be small and negative.According to Mundstock (2006), when the correlation coefficient is high and positive, the units of a systematic sample will be homogeneous, whereas when the correlation coefficient is low, whether positive or negative, the units of the systematic sample will be heterogeneous.This is an indication that correlation coefficient is a measure of homogeneity of a systematic sample.
Table 2 provides error results for the simulated inventories of the forest population in question, along with the percent differences, depending on whether the inventory is following the assumption of independence between sampling units or whether a measure of correlation is added.follows the assumption of independence between sampling units, Error 2 considers the correlations between sampling units and dif (%) is the percent difference between the two estimated errors.

Sample
Formula (2) reveals that a positive correlation between the units of a single sample inflates the variance of the mean sample value.Even a small positive correlation can have a strong effect, due to the multiplier (n-1).
In referring to natural populations, Cochran (1977) argues that there is reason to expect that two observations, y i and y j , be approximately similar when i and j are neighbors in a sample than when they are farther apart.The author maintains that this happens whenever natural forces produce slow changes as one progresses with the sample.Forming a mathematical conception of this effect, one can assume that y i and y j are positively correlated and that this function depends solely on the distance that separates them, thus decreasing to the extent that the distance increases.Although this conception is an oversimplified notion, it may represent an important aspect in many native forest populations.Use of the correlation coefficient between plots ... 1 and 2 reveals, first of all, that within each group the sample having the greatest variance was also the sample having the greatest percent difference between the two estimated errors.

An analysis of Tables
That statement proves that systematic sampling in native forest stands is more accurate when the correlation coefficient is used for estimating the variance of the mean.It proved effective in the simulations run here in improving the accuracy of the forest inventory.And consequently, it can be said that the usual formula of variance of the mean fails to efficiently capture the variability present in the area when systematic samplings are performed.
Inventory errors were invariably smaller with addition of a correlation measure between the sampling units than the errors obtained when the inventory was based on the assumption of independence between the sampling units, for the area in question (Figure 2).On average, a reduction of 14.3% was noted when the correlation coefficient was added to data processing, all three groups considered.There being impact on error, changes will occur to the confidence interval range.This is illustrated in Figure 3, as when there is assumption of independence between sampling units, the confidence interval is overestimated in relation to reality.By adding a correlation measure between the sampling units, accuracy is increased and, consequently, the confidence interval is narrower than obtained previously.
From results, this study suggests that the processing of forest inventory data undergo an exploratory analysis.The exploratory analysis should place special emphasis on the issue of whether there is a correlation between the sampling units or not.There being a correlation, suitable estimators should be used that take such correlation into account.
Figure 3 illustrates that all generated confidence intervals were reliable in that they contained the population mean, noting that the intervals generated by adding the correlation coefficient proposed by Cochran (1977) not only contained the parameter mean (4.5298 m 3 ) but they also had a narrower range, confirming an improved inventory accuracy without loss in estimate veracity.Figura 3 -Comparação entre as coberturas dos intervalos de confiança gerada pelo estimador usual (linha cheia) e pelo estimador considerando a correlação entre parcelas (linha tracejada) para as respectivas amostras simuladas, e média populacional do povoamento que corresponde 4,5298 m 3 (linha cheia na horizontal).

Figure 1 -
Figure 1 -Map of the study site with delimitated plots.

Figure 2 -
Figure 2 -Comparison of inventory errors when using the variance of the mean based on the assumption of independence between units (dashed line), as opposed to when using the Cochran formula (solid line).

Figure 3 -
Figure 3 -Comparison between confidence interval coverage as generated by the usual estimator (solid line) and by the estimator that considers the correlation between plots (dashed line), and population mean of the stand which is 4.5298 m 3 (horizontal solid line).

Table 1 -
Estimates of mean, coefficient of variation (CV%) and correlation coefficient (ρ W ), divided into groups according to the sampling intensity, for the variable volume (m 3 ).

Table 2 -
Estimates of inventory error (%) for different systematic samples, simulated in the study site, in which Error 1