Allometric models to biomass in restoration areas in the Atlantic rain forest

The objective of the study was to present mathematical models and strategies for fitting equations to estimate dry biomass for tree species in forest restoration areas. The presence of outliers was analyzed in each fitted equation using values of the matrix H, leverage points, means of standard and studentized residuals, and of influential points through DFFITS, DFBETAS and COOK distance values. Furthermore, the normality, homoscedasticity and independence of residuals were checked. The accuracy of the fitted equations was evaluated by means of the Radj., Syx, analysis of residuals, and AIC and BIC criteria. The results showed that the model for estimating dry biomass as a function of the variables Dc2, DBH2, Hc2 and DBH provides the more accurate solution, with Syx = 40.91% and R 2 adj. = 0.92. We concluded that the performance of this equation improves when adjusted to data stratified by classes of height-diameter ratio, which reduces the value of the estimated error.


INTRODUCTION
Forests are important for fixing carbon, absorbing it from the atmosphere in the form of carbon dioxide and converting it into carbohydrates to form wood tissues, leaves, seeds and fruits (Rochadelli, 2001).This process promotes the reduction of the concentration of greenhouse gases (GHG) and, consequently, of the risk of global warming caused by their high levels in the atmosphere.
Forests play an important role in the global carbon cycle, being responsible for about one-third of emissions of anthropogenic carbon dioxide (CO 2 ) to the atmosphere.However, human activities in forests have also been a source of emission of carbon dioxide into the atmosphere, with deforestation contributing with about a fifth of annual anthropic emissions of GHG, especially in the tropics.Planting forests for the recovery of degraded areas has become one of the main measures to mitigate global warming, because carbon is stored permanently.
Measuring how well forests store carbon is essential because it allows not only knowing the accumulated biomass but also highlighting the importance of it within the climate change scenario.Besides providing information on the quantity of CO 2 released in the case of slash-and-burn, biomass estimates allow the monitoring and evaluation of export of nutrients during forest exploitation (Higuchi et al., 1998).The term biomass corresponds to the organic matter stored in a particular ecosystem, composed mainly of carbon structures and mineral elements (Larcher, 1986).
Forest biomass can be quantified by direct and indirect methods.The direct method is destructive, because the tree is chopped down and its components are weighted.The indirect method is performed by means of estimates, which can be made with satellite images (Goetz et al., 1999;Chen et al., 2003;Drolet et al., 2005;Watzlawick et al., 2009;Corte & Sanquetta, 2007;Frankenberg et al., 2011;Song et al., 2013) or via mathematical models (Rezende et al., 2006;Urbano et al., 2008;Morais et al., 2013b;Melo et al., 2014).In the case of mathematical models, the biomass is estimated as a function of variables easily measured in forest inventories.Linear and non-linear mathematical models can be used in such estimations, but non-linear models are preferred (Baskerville, 1972;Ketterings et al., 2001;Gehring et al., 2004;Sanquetta et al., 2004;Segura & Kanninen, 2005;Wang, 2006;Soares et al., 2006;Kenzo et al., 2009;Moore, 2010;Addo-Fordjour & Rahmad, 2013), because their application enables the estimation in more extensive forests areas.
However, modeling should be done with caution because various factors affect biomass production.Some examples are the species composition, stage of forest development, nutritional status and edaphological characteristics (Larcher, 1986), in addition to factors related to respiration and photosynthesis (Kramer & Koslowski, 1972) and the ecological groups to which the species belong to (Luo et al., 2014;Barbosa et al., 2014).Vogel et al. (2006) still mention that biomass estimation of a forest depends on its vegetation type and its location.
The objective of this work was to present alternative mathematical models and strategies for fitting equations to estimate the dry biomass of species in forest restoration areas.

Data collection
The data for this study were collected in a restoration forest area in the municipality of Seropédica, RJ, in the coordinates 22° 43' 34" S and 43° 38' 34" W. The climate of the region, according to the Köppen classification is Aw (Brasil, 1992), is tropical with rainy summer.According to data collected in the last 20 years by the PESAGRO-RJ meteorological station, the closest to the place of study, the average annual rainfall is 1,245 mm and the average annual temperature is 23.7 °C, with a relative humidity of 69%.
The data used in this study were obtained from 111 trees, distributed in 50 different species and contemplating all the diametric structure of individuals measured in the inventory.The species sampled were: Acacia polyphylla DC., Aegiphila sellowiana Cham, Albizia polycephala, Alibertia concolor K. Schum, Anadenanthera macrocarpa (Benth.)Brenan, Anadenanthera falcata (Benth.),Bauhinia forficata Link.Each tree was classified in their respective ecological group and measured for canopy diameter (CD), diameter at 1.30 m above ground (DBH) and total height (Ht).Each tree was felled and its commercial height (Hc) up to a stem diameter equal to 5 cm was measured.
Different fractions of forest biomass from each tree were separated using a complete dissection technique and weighted.Each tree was separated into the following compartments: bole, live branches, dried branches, leaves and roots.All components of the tree were separated and weighed on a scale in the field.For the root sampling, a 0.50 m deep trench was opened around each felled tree and the roots with a diameter equal or greater than 1 cm were pulled out.The removed roots were cleaned and then weighed.Finally, the total green weight of each felled tree was obtained by the sum of all compartment weights.
After the weighing (total green weight) of each compartment in the field, samples were collected and stored in plastic bags to prevent moisture loss and sent to the laboratory for analysis.In the laboratory, the samples were weighed in a precision scale to obtain the green weight of the sample.After drying, the dry weight of the material was estimated and, consequently, the moisture content according to the following Equation 1: Where: MC = moisture content (%); DW= dry weight (kg); GW= green weight (kg).
The dry weight of each compartment was estimated by multiplying the total green weight obtained in the field by its moisture content (%).The total dry biomass of each tree was obtained by the sum of the dry weights of all compartments.
During the felling of 111 trees in the field, wood disks were removed at DBH level in each tree to quantify the wood basic density (d), which was determined according to the Equation 2: (2) where: d = wood basic density (g.cm -3 ); M = dry mass (g); V = saturated volume (cm 3 ).

Mathematical modeling
Sixteen mathematical models were fitted by means of 5 different strategies in order to estimate the total dry biomass as a function of the variables: diameter at 1.30 m above ground (DBH), total height (Ht), commercial height (Hc), average canopy diameter (CD) and wood basic density (d) (Table 1).These parameters were chosen because they are the basic properties of a tree, besides being easy to measure.Density was included as an auxiliary variable attempting to improve the performance of the models to estimate tree biomass.The estimated descriptive statistics for each variable analyzed were mean, standard deviation, coefficient of variation, mean standard deviation, absolute and percentage sampling error, and confidence interval (Péllico & Brena, 1997).
In the strategy I, the single-entry models 1 to 6 were fitted only in function of the DBH.In the double-entry models 7 to 12, the adjustment was done for total dry biomass as a function of the independent variables DBH and Ht.
In the strategy II, the models 1 to 6 were fitted only in function of the basic density of the wood.The models 7 to 12 were fitted using the basic wood density and the DBH as independent variables.In the third strategy, the models 13 and 14 proposed by Chave et al. (2005) were considered and denominated Chave 1 and 2, respectively.These models are appropriate for inclusion of wood basic density of total dry biomass as an independent variable, according to Nogueira et al. (2008).
In the strategy IV, a modeling via stepwise procedure was performed using the variables DBH, Ht, Hc, CD and d in their normal and operational forms.In the Type 1 model, variables were used only in their normal form.In the Type 2 model, the transformations made were: inverse, squared, logarithmic, square of the inverse, logarithm of the inverse and all combinations of them, totaling 66 possible variables.In both cases, after analyzing the correlation matrix between variables, it was possible selecting the best mathematical combination by the stepwise method.
Finally, the model with the best performance used in the previously described strategies was applied in the strategy V.For this reason, the fit of the model was carried out with the data stratified into classes of diameter, height-diameter ratio, and ecological group of species.The diameter classes were: DBH < 5 cm (Group A), with 67 trees; 5 ≤ DBH < 9 cm (Group B), with 37 trees; and DBH ≥ 9 cm, with 7 trees.The Ht/DBH ratio classes were: Ht/DBH < 1 (Group C), with 39 trees; and Ht/DBH > 1 (Group D), with 72 trees.Finally, the ecological group classes were: Pioneer (Group F), with 80 trees; and Secondary (Group G) species, with 31 trees.
As there was no combination of stratification criteria, the number of trees used for adjustment in each class was considered sufficient, ensuring the control of tree variability in the diameter classes.Y = total dry biomass; x i = independent variable, which can assume the values of DBH = diameter at 1.30 m from the ground (cm); Ht = total height (m); Hc = commercial height (m); CD = average canopy diameter (cm); d = basic density (g.cm -3 ) and their transformations, namely, inverse, squared, logarithmic, square of the inverse, logarithm of the inverse and all combinations of them; ln = natural logarithm; e = estimation error.The models 1 to 12 were adapted from Scolforo (2005) and the models 13 and 14 from Chave et al. (2005).
Significant differences between the equations fitted with the entire data set and appropriately stratified, considering the different strategies previously mentioned, were examined by means of an identity test between the models (Regazzi, 1992).
The accuracy of the equations, adjusted in all cases, was evaluated by means of the adjusted coefficient of determination (R 2 adj .)(Equation 3), standard error of the estimate in percentage (S yx %) (Equation 4), graphical analysis of residuals and significance of the coefficients by means of a t-test, at 95% probability level.

(
) ( ) ( ) ( ) Where: R 2 adj.= adjusted coefficient of determination; S yx = absolute standard error of the estimate; S yx % = standard error of the estimate in percentage; y i = observed dry biomass (kg); ŷ = estimated dry biomass (kg); y = arithmetic mean of observed dry biomass (kg); n = number of observed data; p = number of coefficients of the model; SQ res .= sum of squares of residuals; SQ total = total sum of squares; AIC = Akaike Information Criterion; BIC = Bayesian Information Criterion; e i = error estimate and ln = natural logarithm.
In addition, a diagnosis of the models was done for each adjusted situation, as described in Scolforo (2005).The projection matrix H was analyzed for leverage points to check the occurrence of outliers in the data base, and the standard residuals (rp) and Studentized residuals (rs) were used to detect outliers in the total dry biomass variable.The significant presence of discrepant observations was verified by the Bonferroni's test at 95% probability level.The detection of influential points on model fitness was performed by means of DFFITS, DFBETAS and COOK distance values.The Shapiro-Wilk's test, White test, and Durbin-Watson statistics were applied to check normality, homoscedasticity and independence of the residuals, respectively, at 95% probability level, applying the box-cox transformation when needed.When appropriated, multicollinearity was also examined to check the existence of correlation between the independent variables.The adjustments of the models as well as the statistical analysis applied in this research were done with the softwares R (R Development Core Team, 2001) and SAS (Statistical Analysis System).

RESULTS AND DISCUSSION
The statistics in Table 2 indicate that the variable total dry biomass (DW) had one of the greatest variability among tree volumes in the forest restoration area.The coefficient of variation for this variable was 140% and the sampling error approximately 20%.An error less than 10% indicates that the sample is representative of the population to estimate the variable in question (Péllico Netto & Brena, 1997).However, the sampling error obtained in this research can be considered low in view of the high variability of dry biomass among trees in this forest community (Melo et al., 2014;Morais et al., 2013a).
The error was less than 10% for the variables DBH, Ht and Hc, confirming the representativeness of the sample.The low mean values of these variables characterize young stands in the development phase.Furthermore, all trees scaled in the study area belong to the ecological group of pioneer or secondary species.
The presence of outliers and influential points was analyzed in all the adjustment strategies, in addition to normality, homoscedasticity and independence of residues.Only in the case of the Type 2 model adjusted according to the strategy V, the values of the matrix H, Standardized (rp) and Studentize (rs) residuals indicated three observations as possible leverage points or outliers in relation to the dependent variable Dry Weight.This is owing to values of the matrix H above the critical value of 0.1351 (3p/n) and of residues within the acceptable limit (-3 ≤ rp or rs ≤ 3).However, the non-significant (p-value > 0.05) result of the Bonferroni test at 5% probability indicated that these observations are not considered as discrepant values.The DFFITS, DFBETAS and COOK distance values indicated some influential points in the fit of the Type 2 model.An alternative to improve the fit of the model in the presence of these points is to use robust regression (Cunha et al., 2002).
The coefficients and statistics to adjust the models used in the strategy I are presented in the Table 3.The lower values of adjusted coefficients of determination were obtained when DBH alone was used to explain the total dry biomass of trees (models 1 to 6).The equations derived from these models also provide the largest errors of estimate (S yx ), reaching values greater than 100%, characterizing low accuracy, in addition to high AIC and BIC values.When the variable total height was included in the models (7 and 12), there was an increase in accuracy of the estimated dependent variable, which was characterized by a decrease of AIC and BIC values and the standard error of the estimate, which varied between 60 and 80%.However, this accuracy is still low when compared with that obtained in other reaserchs in which double entry variables were applied in models for estimating tree biomass in native forests (Watzlawick et al., 2009;Rezende et al., 2006).An alternative would be to apply the models to the species separately, thus reducing the variability caused by the characteristics of different species.Good results can be obtained even when the models include only the diameter (Litton & Kauffman, 2008;Melo et al., 2014).In the work of Melo et al. (2014), the best models to estimate the biomass of "caixeta" (Tabebuia cassinoides (Lam.)DC.) were the ones that included only the diameter as independent variable.
For the strategy I, the Schumacher-Hall model was the one that provided the best estimates of dry biomass, even though it did not provide the best adjusted statistics and the largest adjusted coefficient of determination (R 2 adj.).This model was used by Scolforo et al. (2008) to estimate the amount of biomass and carbon in different forest physiognomies in the State of Minas Gerais, demonstrating its applicability.This model was selected by means of graphical analysis of residuals, which presents a distribution without trends when compared to other models (Figure 1).The Shapiro-Wilk's test (normality) and Durbin-Watson  When DBH and the wood basic density were used as independent variables in the modeling (strategy II), the performance of the models 7 to 12 was similar to that observed in the strategy I, when evaluating the obtained statistics for the adjustments.Strong emphasis should be put on single-entry models (models 1 to 6), revealing that the variable wood basic density should not be used alone to explain the total dry biomass.This was demonstrated by the substantial increase of AIC and BIC values, errors greater than 140% and the R 2 adj .values tending to zero (Table 4).This implies that the changes in the variable basic density explained little of the dry biomass variation.For this strategy, the best performance was also achieved with the adjusted model of Schumacher-Hall, as seen in the graphical distribution of residuals (Figure 2).
The adjustment in strategy III (Table 5), the models proposed by Chave et al. (2005), which also include wood basic density as an independent variable, presented    adjusted statistics similar to the Schumacher-Hall model used in strategies I and II.As a comparison, it was clear that the Schumacher-Hall model fitted as a function of DBH and Ht (Strategy I) stood out for providing the best results for the standard error of the estimate (67.42%) and the adjusted coefficient of determination (0.77).Thus, the inclusion of wood basic density of the species, regardless of the model adopted, is not an technically feasible option when the intention is to estimate the dry biomass in forest restoration areas.This is why the model that included this variable did not generate significant gains in accuracy, as showed in the graphical analysis of residuals (Figure 3).Furthermore, this variable is difficult to be measured, as highlighted by Chave et al. (2005), justifying the use of DBH and Ht alone as they are more accessible to be measured.
The best equations were derived from modeling by the stepwise method for variable selection, in particular in the Type 2 model.Of the 66 evaluated variables, DBH, DBH 2 , (Hc) 2 and (CD) 2 were selected to compose the model and resulted in an adjusted equation with coefficient of determination (R 2 adj .)=0.92 and the lowest standard error of the estimate (40.91%).The residuals of this equation presented homogeneous distribution.However, this equation still presented a clear tendency to   overestimate dry biomass results, mainly in the case of trees with smaller diameters (Figure 3).
The Shapiro-Wilk's test and Durbin-Watson statistics were not significant (p-value > 0.05), indicating that the residuals were normally distributed and were not correlated among themselves, while the White test was significant (p-value < 0.05), showing the presence of heteroscedasticity of residuals.To correct this problem, two alternatives were used: the first consisted in the transformation of the dependent variable, and the second in the use of weighted regression.The transformation of the dependent variable was identified from a family of data transformations Y Y ′ = λ suggested by Box-Cox.In the present case, λ ranged from -1 to 1, with an interval of 0.1.Homogeneity of residuals was not achieved even after Box-Cox transformation.Because of that, we decided to use weighted regression, applying the weight The best adjusted statistics were obtained when the Type 2 model was applied to adjust the set of equations with stratified data.The adjusted coefficients of determination (R 2 adj .)were higher than those obtained with other adjustment procedures, reaching values greater than 0.90.Moreover, there was a clear reduction in the standard error of the estimate (S yx %), reaching errors close to 16% when the number of diameter classes was increased (Figure 4).Also, both statistical values of AIC and BIC presented a significant reduction.However, these results occurred only in the case of trees of larger size, while the error remained high in trees of smaller diameter (Table 6).Data stratification using the ratio Ht/DBH indicates the best strategy.In this situation, the errors were smaller than 50% and the graphical distribution of residuals homogeneous when compared to other strategies.Additionally, the adjusted coefficients of determination were acceptable for biomass estimation with the obtained equations.

CONCLUSION
Schumacher-Hall model should be used to estimate the total dry biomass in forest restoration areas when only the variables diameter at 1.30 m above ground and total height are used as independent variables.
The inclusion of the variable wood basic density does not provide significant gains in accuracy of the equation, making its use impractical in mathematical models, besides the difficulty to be obtained.The average canopy diameter, commercial height and diameter at 1.30 m above ground, which are easily measured in the field, are the variables that best explain the variation in total dry biomass.They should be used in the Type 2 model, as independent variables, because they contribute to a more accurate equation when estimating the dry weight of native species in forest restoration areas.
Finally, to promote gains in accuracy of the estimation of the total dry biomass in the Type 2 model, the model should be adjusted to data stratified into classes for the height-diameter ratio (Ht/DBH).

Figure 1 .
Figure 1.Residual graphic distribution of the single-and double-entry models adjusted for estimating total biomass in the strategy I.

Figure 2 .
Figure 2. Residual graphic distribution of traditional models adjusted for estimating total dry biomass in the strategy II.

Figure 3 .
Figure 3. Residual graphic distribution of models adjusted according to the strategies III (Keys 1and 2) and IV (type 1 and 2).

Figure 4 .
Figure 4. Residual graphic distribution of the Type 2 model adjusted to the stratified data of Ht/DBH ratio (A and B), diameter classes (C, D and E) and ecological categories (F and G).

Table 1 .
Adjusted mathematical models to estimate total biomass of trees in forest restoration areas.

Table 2 .
Descriptive statistics for the selected variables.

Table 3 .
Coefficients and adjusted statistics of the single-and double-entry models adjusted according to the strategy I.

Table 4 .
Coefficients and adjusted statistics of the single-and double-entry models adjusted according to the Strategy II.

Table 5 .
Coefficients and adjusted statistics of the models used in the adjustment strategies III (Models 13 and 14) and IV (Models 15 and 16).

Table 6 .
Coefficients and adjusted statistics of the Type II model applied to different stratification methods.