Site classification for Eucalyptus sp. in a tropical region of Brazil

Abstract The aim was to determine the productive capacity of a forest site by applying different methods of fitting, combined with geostatistical techniques, to stands of Eucalyptus sp. in a tropical region of Brazil. Data were collected from 845 plots of a continuous forest inventory over four years. The classification of local production capacity was performed using growth curves obtained by the guide curve (GC) method, algebraic difference approach (ADA) and generalized algebraic difference approach (GADA) methods and ordinary kriging through the spherical, exponential, and Gaussian models to determine the spatial dependence of the variables site, geographical boundaries of site index classes, and their respective areas for each hectare in an annual production unit (APU). The modified Chapman–Richards model, fitted by the generalized difference approach method (GADA), provided the best statistical results, an improvement of 12.23% and 39.80% on the ADA and GC methods, respectively. The exponential model selected to express the spatial distribution of dominant height showed a high degree of spatial dependence.


INTRODUCTION
Growth evaluation in planted forests is essential, due to the requirements of strategic nature: the good performance of silvicultural practices, genetic improvement, and maximization of production gains.The ability of a given species or genetic material to grow in each site encompasses climatic, edaphic, and biological factors (Scolforo 1998).
The site index, which is the average height of dominant trees at a certain reference age (Clutter et al. 1983, Burkhart & Tomé 2012) is strongly correlated to growth factors (soil and agrometeorological variables) and volume yield.It is also strongly influenced by density and treatments like thinning, which provide a reliable source of environmental information.
According to Andrade (2004), the site index is currently the more practical and widespread method to classify forest productivity, since it uses dominant height as a variable, which is the response to inter-related environmental factors, and is highly correlated with the volume yield.Furthermore, dominant height is relatively independent of the stand density.
The most commonly used methods for constructing production curves through dominant height, with data from permanent and temporary sample plots and stem analysis are: (1) Guide curve (GC), (2) Algebraic difference approach (ADA), (3) Parameter prediction (PP) and (4) Generalized algebraic difference approach (GADA).
To decide which method will be used, some concepts and desirable peculiarities should be considered for the construction of the site curves (Bailey & Clutter 1974, Cieszewski & Bailey 2000, Castillo Lopez et al. 2013).Standard sigmoid growth with an inflection point and the ability to reach a horizontal asymptote at more advanced ages provides a logical response (for example, dominant height must be equal to zero at zero age and the curve should always keep growing), as does invariance in relation to the simulation path (path invariance), and age of reference (base-age invariance).The condition of invariance, related to the method of performing the simulation, implies that when the dominant height H 1 at the age t 1 is estimated with the curves, the value of dominant height H 3 at age t 3 should result in the same value when estimating firstly the dominant height H 2 at age t 2 and then, with this value, the dominant height at age t 3 .
The condition of invariance also implies that the shape of the curves should not vary, regardless of the reference age, which is used to define the site index.However, the age of reference to be used should be close to the expected rotation age, as height, at this age, presents high correlation with the volumetric yield (Pece de Rios 1993).
Basically, two types of site index curves can be constructed: harmonic curves or anamorphic, and natural curves or polymorphic.The anamorphic curves are constructed by fitting the guide curve as a function of age.Subsequently, a series of curves with the same shape are built above and below of the guide curve, differing only in magnitude by a fixed percentage.
The anamorphic curves basically present two sources of errors (Batista & Couto 1986, Andrade 2004): (1) Anamorphic curves are only accurate when the dominant height sampling is properly carried out, so that the variation in the site index be equally represented across all ages; (2) Anamorphic curves consider the influence of the site variation in dominant height growth to be uniform in all ages, so that the shape of the height growth curves is the same for all sites.
Polymorphic curves are constructed using data from permanent plots or stem analysis.They assume that the shape of the growth curves of the trees varies from site to site, i.e., they do not have proportionality to the height growth of dominant trees between different site classes.There seems to be a consensus among most researchers (Scolforo 1993, Cieszewski & Bailey 2000, Weiskittel et al. 2011, Burkhart & Tomé 2012) that the polymorphic curves are superior to anamorphic in terms of the accuracy of the site index estimate.
The guide curve method enables only the construction of anamorphic curves and, therefore, is the simpler method and is less indicated among other more elaborated methods.
The methods of algebraic differences and prediction of parameters, however, have been presented as an effective alternative to the construction of the polymorphic site curves, assuming the assumptions of different growth rates between different sites and invariance in the path of the simulation and reference age.Bailey & Clutter (1974) formalized the invariance property in site index age models and presented a technique to derive equations for a dynamic method, known as the algebraic difference approach (ADA), which essentially involves An Acad Bras Cienc (2023) 95(1) e20200038 3 | 15 replacing a parameter of the basic method to express it as a function of the site (in this case, a combination of dominant height and age).The main limitation of the ADA methodology is that most of the derived models are anamorphic or have an asymptote (Bailey & Clutter 1974, Cieszewski & Bailey 2000).Cieszewski & Bailey (2000) introduced a generalization of the ADA methodology, the generalized algebraic difference approach (GADA).The main advantage of this method is that the basic equation can be expanded according to several growth theories, as the rate of growth and asymptote, allowing more than one parameter of each model to be a function of the site quality, and the growths curves obtained are more flexible (Cieszewski & Bailey 2000, Cieszewski 2001, 2002, 2003).With this generalization, we can get growths curves that are polymorphic and have multiple asymptotes (Cieszewski 2002).
The classification of a traditional site, regardless of the method, as here exposed, considers the average of a specific sample point evaluated, not taking into consideration possible correlations between units surveyed in nearby regions (Bognola et al. 2009, Mello et al. 2006).
Therefore, the use of geostatistical techniques provides significant functional space, since potential existing space dependence between sample units can be explained in an integrated manner.Yet, as a bigger advantage, locations that are not sampled can receive unbiased qualitative or quantitative allocation and are correlated and interpolated to the sampled points through the application of multivariate linear regression techniques.
This tool brings advantages for modeling the site and the sampling procedures on different forms, allowing the definition of geographical spatialization of selected variables, area quantification, stratification of a more specific population, improvement of statistical estimators and a reduction in sampling error, generating tools for the physical delimitation of production units and for precise silvicultural practices, as demonstrated by Péllico Netto et al. (2014) andPelissari (2015).
In Brazil, the exhaustion of available areas for forest cultivation in places traditionally used by producers triggered the process of expansion of the production frontier to sites with distinct edaphoclimatic characteristics, where modeling methods of growth and production are still little known.
Allied to these factors, the concept of precision management using geostatistical methods allows a more effective and accurate management of forest stands, even reducing production costs.
However, the site is an important and essential variable used in the improvement of statistical sampling estimators, when it is used as a basis for stratification of the population (Péllico Netto et al. 2014).The purpose of this research was to generate local productivity classes, through three methods (GC, ADA, and GADA) to build up the site curves for Eucalyptus sp. in a tropical region of Brazil, by means of permanent sample plots annually measured between 2011-2014, and the application of geostatistics to classify the cartographic bases of the plantations.
The main objective of this study was to evaluate the productive capacity of forest sites based on site index by means of traditional methods and geostatistical techniques for clonal stands of Eucalyptus sp. in a tropical region of Brazil.For the more specific objectives of fitting models for the classification of forest sites, the methods of guide curve, algebraic difference approach, and generalized algebraic difference approach were applied.Additionally, ordinary kriging was applied to the dominant heights at sites to test the spatial dependence of this variable and, if so, plot the spatial boundaries of the site classes in the cartographic base of the forest stands, according to their site index at a reference age.

MATERIALS AND METHODS
To conduct this research, 2,452 pairs of dominant height (h dom ) and age from 845 permanent sample plots, systematically distributed in a sampling intensity of 1:2.5 ha, in stands of Eucalyptus sp. in a tropical region of Brazil were used, as presented in the characterization of the study area.
Assmann's principle (1970) was used to obtain the average dominant height for each sample plot, based on the 100 trees with the largest diameter per hectare, healthy and free of defects.The classification of productive capacity was tested using three methods for fitting the site index -SI: GC, ADA, and GADA.
The minimum, maximum, means values and age, as well as the standard deviations (SD) of diameter (d 1,30m ), total height (h) of the data are shown in Table I.
The data collection was performed in permanent sample plots, rectangular shape, with variable area (PPAV), allocated through a systematic sampling procedure using a square grid of 158.1 m per side.
Five non-linear models were fitted, which are well known among forestry professionals, because their flexibility to describe a standard sigmoid with an inflection point and ending up in a horizontal asymptote, as follows: 0 . 1 - (1) Guide-model: Model 2: Bailey 4 Parameters Basic equation: Model 3: Modified Chapman-Richards model, Clutter (1998) Basic equation: Where: H 0 is the dominant height at the initial age t o Guide-model: Model 4: Modified Chapman-Richards by Amaro et al. (1998) Basic equation: Guide-model: Model 5: Chapman-Richards For this model we can use the basic equation (1) Parameters of site: θ 0 =e (X) , θ 1 = β 0 and θ 2 =β 1 +β 2 /X (9) The solution for X with initial values (t 0 , X 0 ) (10) Dynamic equations: Where: H is the dominant height of the i th tree at age t, in m; H 0 is the dominant height at the initial age t 0 ; X is the value of the function at age t, and X 0 is the reference variable defined as the value of the function at age t 0 , t 1 is age at time 1, in years; t 0 is age at time 0, in years; SI is the Site-Index, in m; 2 1 , ... θ θ θ n are parameters of the model; β 0 , β 1 and β 2 are the coefficients of the fitted site parameters; e is the Euler' exponential; ε is the random error.
Models 1 and 2 were fitted by the guide curve method, models 3 and 4 by the algebraic differences approach method (ADA), and model 5 by the generalized algebraic differences approach (GADA method), proposed by Cieszewski & Bailey (2000) to obtain the site curves.
The regression coefficients of the fitted models using the GC method and ADA were estimated by Levenberg-Marquardt's method for minimization of the non-linear residual sum of squares (Marquardt, 1963), using Statgraphics Centurion XVI software version 16.1.02.For more details about the GADA approach, review the procedure in (Cieszewski & Bailey 2000,

Statistical analysis of the modeling
We evaluated three statistics as a criterion for selecting the best model: lowest standard error of the estimate in percentage (S xy %), higher coefficient of determination (R² adj ) and the best residual dispersion in percentage (R%), as presented in (13-18): yx yx S S % = .100y (16) Where: SQR is Residual Sum of Squares and SQT is the Total Sum of Squares, E i are the residuals dispersion and E(%) are the percentual distribution of the residuals (which were presented in a plot).

Application of ordinary kriging
Geostatistics defines a set of mathematical procedures that allows the recognition and description of spatial relationships between sampled points in each domain.
According to Yamamoto & Landim (2013), the kriging uses information from the variogram to find the optimal weights to be associated with samples of known values, so that it is possible to estimate values of unknown points.It is understood to be a series of regression analysis techniques that aims to minimize the estimated variance from one model, which considers the stochastic dependence between the spatial distributed data.
The main objective of kriging is to define the site limits in each stand of the forest area to compose the stratified estimates.
Ordinary kriging was chosen for the spatial modeling of the site, and is expressed as follows (Pelissari 2015): ( ) ( ) Where: krig Z is the estimator for kriging interpolation; i  are the weights associated with the n data; and ( ) i Z x are the observed data of dominant height at the reference age.
Data were organized in spreadsheets and the models were fitted using the software R (R Development Core Team 2009, Robinson & Hamann 2011).The geoR package, in which a specific CSV file was defined, containing the coordinates (x, y) of the Universal Transverse Mercator (UTM), in the SIRGAS 2000 reference system, was applied for computing the distances between the sampling points.The numerical value of the site at the reference age of seven years for each sample unit was defined as the dependent variable Z. ARC GIS 10.2 software was used to generate thematic maps.
( ) Where: ( ) γ h is the semivariance of dominant height at the reference age ( ) expressed in meters; C 0 is the nugget effect; C is the variance a priori; A is the range.
The objective of this specific procedure was to evaluate the quality of the fitted model by the coefficient of determination (R²) and the weighed sum of squares of deviations (SQDP).The degree of spatial dependence (DSD) of the variable site index was estimated in (23).θ θ

Site classification
All estimated parameters of the fitted models by the GC method, ADA, and GADA were significantly different from zero at 95% probability level, and presented in Table II.As expected, the GC approach presented the worst adjusted coefficient of determination values (R² adj = 0.753~0.755)and precision (S yx % = 8.62~8.65%).
The graphical residual distributions (Figure 2) indicate greater homogeneity as the stands are close to the technical rotation for all curve construction methods, however, it is noticeable that the GADA method provides a more expressive effect to detect the yield capacity, proving the better efficiency of this method.
In general, the five fitted models resulted good statistics of adjusted coefficient of determination, precision, and residual distribution.However, it is noticeable that the GC methodology presented the more expressive residual dispersion, ranging between -45% and 25%, i.e., the worst results.The distributions residual dispersions of the ADA and GADA were balanced and showed more adherence to the data variation, whose values occurred between -22% and 23% for ADA, and between -20% and 20% for GADA.
Evaluating the mean curve generated for each methodology (GC, ADA, and GADA), differences in the curvature pattern, scale, and asymptotic points are not observed, except for the fitted Bailey model by the GC method, which features a horizontal asymptote at an earlier age than that seen in the other fitted models (Figure 3).
The three proposed approaches produced similar results, however the GADA method generated results with smallest standard error of the estimate, highest coefficient of determination, and best residual dispersion, being 12.23% higher than the ADA and 39.80% higher than the GC method.Considering this evaluation, the Chapman-Richards model fitted by the GADA method was the best to represent the data in the present study, therefore the selected model to generate the site index.The complex polymorphic growths curves developed using the Chapman-Richards model (GADA) and the four years' observations of the forest inventory (inventory in 2011 to inventory 2014) can be observed in Figure 4.The stand was classified in two site classes (I and II), due to its homogeneity and the strategy to be adopted for applying stratified sampling.
According to Scolforo (1993), the height growth curve tends to present a more pronounced sigmoid form at more productive sites, a proof that the proportionality between the curves of the site is a failure.Already, in less productive sites, the height growth pattern tends to be smoother, i.e., the inflection point is reached later than in the more productive sites.Therefore, polymorphic models produce growth curves closer to biological reality.
Differences in productivity performance between the site classes and the early asymptotic culmination occur owing to the edaphic factors in the region, once sand soils of quartz of low natural fertility are dominant in these areas, formed by stains of Quartzarenic Neosol, with less than 10% clay content (EMBRAPA 2006).Another important factor is the choice of genetic materials, given that it is a pioneer region,   where the first productive cycles are just being completed for the first time.
When dominant height amplitude in the reference age of seven years was observed, two site index curves were defined at the center-ofclass values of 29.9 m for site I and 24.8 m for site II, whose upper and lower limits and centerof-class are presented in (Table III).
A desirable characteristic is that the data are kept in the same site class along the measurements (David 2014, Clutter et al. 1983).
The mean dominant heights in their respective ages of measurement are shown in (Figure 5), in which the stability of the plots within the same site class is proven for 2011, 2012, 2013, and 2014 measurements.

Ordinary kriging
The fitting of the semivariograms for the dominant height at the age of seven years (h dom ) for the exponential, spherical, and Gaussian models are presented in Table IV, where a high The exponential model resulted in a higher degree of spatial dependence (DSD = 76.9%), which was considered strong when equal to or greater than 75% (Pelissari 2015, Cambardella et al. 1994); a lower nugget effect (C 0 = 0.989); a higher coefficient of determination (R² = 0.94); and a smaller sum of squares of the weighed deviations (SQDP = 0.0002).These results confirm the high spatial dependence of the variable in the present study, corroborating positively to appropriate estimates in non-sampled sites, for delimitation and quantification of site classes in the stands.In addition, Figure 6 shows the semivariograms adjusted to the exponential model, in which their appropriate features explicitly represent the data of the stand.
The exponential model was used for the preparation of the ordinary kriging, and after its completion, the observed and predicted values were used for the preparation of cross validation.In the cross-validation, for the exponential model results in a linear coefficient of 2.8753, angular coefficient of 0.8757, number of neighbors of 6, R² of 0.64 and S yx % of 5.25% (Table V).The R² value was close to ideal (> 0.70).Similar results were obtained by Pelissari (2015).Despite R² being smaller than expected, estimates for the dependent variable resulted in a standard error of estimate (S yx %) of less than 10%, and was therefore considered adequate to represent the domain under study.
After the validation of the ordinary kriging and the exponential model, the processing routines to create the thematic map were generated (Figure 7) and the computation of areas per APU, stands, and site classes.The summary of areas and sites per APU, with their respective weights (W Sj ), is presented in Table VI.

Some recommendations
(1) The results obtained confirmed better efficiency of the GADA for the establishment of the productive capacity in relation to the other tested methods.Therefore, it is recommended due to its more flexibility and its invariance for a reference age and simulation path.This allows obtaining polymorphic or anamorphic curves when it is included in the same equation more than one parameter as a function of the site.
(2) The generated curves and/or the coefficients of the models can be applied to the tropical regions of Brazil with similar edaphoclimatic characteristics in a reliable way, given the wide range of sample data, concerning ages of two to eight years and all detected site classes.Where: R 2 is the coefficient of determination; S yx % is the standard error of the estimate.(3) The fitting of curves in young forest stands (less than three years) should be used with caution owing to the possible influence of silvicultural treatments and fertilization.
(4) Geostatistical techniques are indispensable tool for forest inventories because they allow generating accurate inferences on dominant height for classification of traditional sites in locations not sampled in the field, with a high degree of spatial dependence.We recommend the use of ordinary kriging to delimit, and calculate areas per site class in stands of Eucalyptus sp.In addition, other variables can be tested to differentiate the productive capacity, such as the physicochemical characteristics of the soil, rainfall, and evapotranspiration.

CONCLUSIONS
Among the tested models of Chapman-Richards and Bailey 4-parameters used in the Guide Curve -GC approach, the modified Chapman-Richards model in two versions used in the ADA approach and the Chapman-Richards model with a set of dynamic equations used in GADA approach, the last one is more flexible and precise to describe the relationship between dominant height growth at different ages and degrees of productivity and, thus, more appropriate than GC and ADA approaches.
The dominant height has spatial continuity structure, allowing the appropriate interpolation of productivity classes in the forest stand.
The ordinary kriging, applying the exponential model, resulted in the best performance with a high degree of spatial dependence, and was therefore an efficient tool to delimit the site index and allow the calculation of stratified areas for local production level.Where: UL is the upper limit; LL is the lower limit; and C is the center of the site index.

Figure 1 .
Figure 1.Distribution of the data to Eucalyptus sp. in a tropical region of Brazil.
values, through adjustment of a straightline equation (24).

Figure
Figure 2. Residual distributions of forest site classification in stands of Eucalyptus sp. in a tropical region of Brazil.

Figure 4 .
Figure 4. Site index for stands of Eucalyptus sp. in a tropical region of Brazil.

Figure 3 .
Figure 3. Growth curves of the five fitted models for stands of Eucalyptus sp. in a tropical region of Brazil.

Figure 6 .
Figure 6.Semivariogram of the exponential model for Eucalyptus sp. in a tropical region of Brazil.

Figure
Figure 7. Thematic map of site classification based on ordinary kriging for Eucalyptus sp. in a tropical region of Brazil.

Table I .
Characteristics of the variables for adjusting models to Eucalyptus sp. in a tropical region of Brazil.

Table II .
Coefficients and statistics of the models for forest sites classification of Eucalyptus sp. in a tropical region of Brazil.
*** The coefficients of all equations were significant.

Table III .
Site Index values for Eucalyptus sp. in a tropical region of Brazil.

Table IV .
Results of the semivariograms adjusted for h dom at the age of reference for Eucalyptus sp. in a tropical region of Brazil.Where: R 2 is the coefficient of determination; SQDP is the sum of squares of the weighted deviations.

Table V .
Cross-validation results for the exponential model for Eucalyptus sp. in a tropical region of Brazil.

Table VI .
Summary of areas and sites per APU for Eucalyptus sp. in the tropical region of Brazil.