Predicting of biomass in Brazilian tropical dry forest : a statistical evaluation of generic equations

Dry tropical forests are a key component in the global carbon cycle and their biomass estimates depend almost exclusively of fitted equations for multi-species or individual species data. Therefore, a systematic evaluation of statistical models through validation of estimates of aboveground biomass stocks is justifiable. In this study was analyzed the capacity of generic and specific equations obtained from different locations in Mexico and Brazil, to estimate aboveground biomass at multi-species levels and for four different species. Generic equations developed in Mexico and Brazil performed better in estimating tree biomass for multi-species data. For Poincianella bracteosa and Mimosa ophthalmocentra, only the Sampaio and Silva (2005) generic equation was the most recommended. These equations indicate lower tendency and lower bias, and biomass estimates for these equations are similar. For the species Mimosa tenuiflora, Aspidosperma pyrifolium and for the genus Croton the specific regional equations are more recommended, although the generic equation of Sampaio and Silva (2005) is not discarded for biomass estimates. Models considering gender, families, successional groups, climatic variables and wood specific gravity should be adjusted, tested and the resulting equations should be validated at both local and regional levels as well as on the scales of tropics with dry forest dominance.


INTRODUCTION
Among the categories of tropical and subtropical forests in the world, dry forests comprise just under half (Murphy and Lugo 1986, Sabogal 1992, Powers et al. 2009).Despite their importance, they are among the most threatened and least studied forest ecosystems and, as a result, may be at greater risk than humid forests (McLaren et al. 2005, Miles et al. 2006, Portillo-Quintero and Sánchez-Azofeifa 2010, Aide et al. 2012, Gillespie et al. 2012).
These forests are a key component of the global carbon cycle in face of climate change (Návar-Cháidez 2014, Chidumayo and Gumbo 2010).While guidelines and recommendations from the Food and Agriculture Organization of the United ROBSON B. DE LIMA et al.Nations (FAO) and IPCC (Intergovernmental Panel on Climate Change) generalize information about the carbon stock and aboveground biomass to wet forests, there is still a lot of uncertainty about the quantity and spatial variations in aboveground biomass and the existing carbon for dry forests in the tropics (Návar-Cháidez 2014).
Several research efforts are underway to fill this gap, but all of them ultimately rely almost exclusively on destructive biomass measurements of individual trees to fit local and/or global models (Gibbs et al. 2007, Návar-Cháidez 2014).Or, they rely on a combination of images remotely detected at different scales to calibrate or validate equations (Fayolle et al. 2013).
An allometric equation is the result of adjusting a tree biomass statistical model to a set of indicators, such as tree diameter and/or height, specific wood weight, or forest type (Chave et al. 2005).Information on the types of models, allometric relationships and applications to different sites are extensively documented in Rojas-García et al. (2015).
One of the problems to be solved that still generates a lot of discussion is about the validation of these equations.In multi-species biomass data, different dendrometric amplitudes can lead to biased predictions at local or global levels (Chave et al. 2005).In order to avoid this bias and to fill the lack of specific models in the dry forests located in Brazil, three large studies (Sampaio and Silva 2005, Sampaio et al. 2010, Alves Junior et al. 2013), generated generic equations for both community (multi-species) and for individual specimens, and exceeded those caveats in accounting for large datasets in those sites with the same morphoclimatic domain (Db> 3 cm).However, the statistical validation of global generic equations for dry forests (Návar-Cháidez 2009a, 2009b, 2014) should be verified and compared with local equations, as well as to measure errors in biomass stock estimates.
Statistical validation is a central aspect for the responsible application of equations for scientific problems, and its importance is recognized by those who develop and/or use for inferences and predictive generalizations.However, there is low consensus about which is the best way to proceed, because there are still confusing affirmations and often mutually exclusive in the literature (Rykiel 1996, Robinson andFroese 2004).

STUDY AREA
The study area is located in the Floresta city, a mesoregion of São Francisco in Pernambuco, 433 km far from the capital Recife.The study was carried out with data from an area submitted to forest management for the production of wood destined to the steel industry, denominated Itapemirim farm, and belongs to Agroindustrial Excelsior S.A.The extremes included in the study were 8°30'37"S and 37°59'07"W (Figure 1).
The average annual rainfall is approximately 400 to 500 mm, with rainy season from January to April, and average annual temperature of 26.1°C.
The city has an area of 3,643.97km² and an average altitude of 323 m.The soil of the region is classified as few deep chromic luvisoil, sandy surface texture to medium and superficial.On the valleys strands prevails gravel soils, but more fertile (EMBRAPA 2007).The vegetation is predominantly Caatinga -savannah, characterized by shrubby-arboreal vegetation, with cactus and herbaceous stratum (IBGE 2012).

SAMPLING
In this research, it was used data from 507 trees of sixteen different species.All species are native to the studied forest and are of economic importance, since they are harvested for production of charcoal, fence posts, furniture, etc. Species commonly found according to the forest inventory are (Abreu et al. 2016): Poincianella bracteosa, Mimosa ophthalmocentra, Aspidosperma pyrifolium, Mimosa tenuiflora, Anadenanthera colubrina, Bauhinia cheilanta, Jatropha mollissima, Piptadenia stipulacea, Croton rhamnifolius, Croton blanchetianus, Cnidoscolus phyllacanthus, Manihot glaziovii, Poincianella calycina, Sapium lanceolatum, Thiloa glaucocarpaand Commiphora leptophloeos.These species are also widely distributed in the dry forests of the Brazilian northeast (Gariglio et al. 2010).
The biomass data were obtained from destructive sampling, which were collected the diameter measures at 0.30 m and 1.30m, above the ground level (base diameter and diameter at breast height), total height, number of branches, diameter of the base of the larger branch and weight of the green mass.The green mass of trees was obtained with a balance adding the weights of the shaft and the branches to form the total green weight per tree for all species (Abreu et al. 2016).
According to equations found in the literature, four database on species level (P.bracteosa, M. ophthalmocentra, M. tenuiflora and A. pyrifolium) and a database on gender level (Croton) were selected to compose the inputs along with all the weighed trees in the sample (multi-species) to validate the equations.

VALIDATION OF BIOMASS EQUATIONS
Despite the importance of dry forests in terms of carbon sequestration and distributed area in the tropics, only a few adequate equations were found in the literature (Table I).Two studies (Návar-Cháidez 2009a, 2014) proposed equations for dry forests in Mexico.Two others (Sampaio andSilva 2005, Sampaio et al. 2010) developed equations for dry forests located in Brazil, considering both multi-species and individual species data.Local equations developed by Abreu et al. (2016), were also used in the validation of biomass only considering data of all species.
In this work, it was not validated equations that used as a predictor variable the basic wood density, as well as the generic pan-tropical equations developed by Brown (1997), updated by Chave et al. (2005).These equations do not encompass dendrometric amplitude for dry forests, although they are recommended by the IPCC guidelines (IPCC 2003(IPCC , 2006) ) for estimating carbon stocks in tropical forests.
The validation analysis consisted in predicting the biomass above the soil for all trees and for the other cases analyzed based on the coefficients of the equations.For this task were computed the following statistics recommended by Mayer and Butler (1993) and Palahí et al. (2002): Coefficient of determination: Where: R² is the coefficient of determination; SQR is the covariance between observed and estimated biomass; SQT is the biomass covariance observed.R² values indicate the total variation of the data explained by the validated equations.Residual Standard Error (RSE): Where: RSE is the residual standard error or the standard error of the estimate; B reali is the actual individual biomass in kg; B esti is the individual biomass estimated in kg; n is the number of sampled trees; and, p is the number of parameters in the model.They represent the effective estimate of the biomass of a tree.High SRE values indicate inaccurate and biased equations in estimating biomass.Bias% or Relative trend: Where: Y i is the observed biomass value (Kg) of trees per unit area, Ŷ i is the estimated biomass value (Kg) of trees per unit area, n is the number of observations.This statistic indicates a tendency of under or overestimation, being a measure of error and quality measure of the validated equations, so the lower the error the greater the efficiency in the generalizations.
Akaike Information Criteria (AIC): .ln 2.  Where: SSE is the sum of squares of the errors; P and n were already defined.This criterion penalizes the addition of parameters in the analyzed functions.The best validated model minimizes the AIC value.

SSE AIC n p n
The paired t-test was also used at the 99% confidence level (α = 0.01) to test the hypothesis that the observed biomass (actual weight) and the obtained biomass by the validated equations are statistically similar.
All computations and analyzes were carried out using R statistical software (R Core Team 2015).

RESULTS
Figure 2 shows descriptive information about the base diameter and the biomass for the analyzed cases.Discrepant data were observed for the diameters, especially when considering all analyzed species.In relation to the biomass, there is for M. tenuiflora the greater variation found among the species and for the Croton genus the smaller dispersion.The average of aboveground biomass considering all species was 12.77 kg, with an average standard error of 1.95 kg.The lowest biomass found was 0.15 kg and the highest was 559.5 kg.From a total of 507 weighted trees, 75% of the observations concentrated values around 9.0 kg, indicating the presence of discrepant data also for biomass (outliers not shown in the figure).
As for the estimates of aboveground biomass (Figure 3), it was observed that the validation was consistent when using the equations of Návar-Cháidez (2009a) and the local equation of Sampaio and Silva (2005) for all species.The equation of Abreu et al. (2016), although local, was not indicated to validate the biomass estimates.For the species P. bracteosa and M. ophthalmocentra, the specific equations of Sampaio and Silva (2005) and Sampaio et al. (2010), were biased and overestimated the biomass of these species.However, the general equation of Sampaio and Silva (2005) indicates reliable estimates as well as for the A. pyrifolium species, in other words, the parameters estimates of this equation are included in the 95% confidence interval of the local biomass estimates.
For the Croton genus, species-level equations were more reliable in estimating biomass.The general equation of Sampaio and Silva (2005)   estimates with precision the biomass for trees with diameters up to 5 cm and it has a tendency to overestimate trees with a diameter above 6 cm.
In five analyzed cases, both the diameters (at 0.30 m and 1.30 m above the soil) and the height of the trees (combined variable general equation of Sampaio and Silva (2005) were important predictors of aboveground biomass.Speciesspecific equations derive from a simple nonlinear model (which does not include tree height and/ or wood base density as a predictor) and end up providing biased adjustments, as can be observed in the overestimation of biomass to P. Bracteosa and M. ophthalmocentra (Figure 3).The results of the validation test show that the lowest values of Bias (%) are found in single entry equations for four cases Návar-Cháidez (2009a) -all species; Sampaio et al. (2010) -Sertânia -M.Tenuiflora; Sampaio and Silva (2005) -A.pyrifolium; Sampaio et al. (2010) -Croton) (Table II).
Neither the estimates of the local equation developed by Abreu et al. (2016), both for all species in this study, tend to show a non-significant tendency, in other words, a non-significant tendency of underestimation.These higher values of Bias (tendency) corroborate the high values of the Akaike information criterion (AIC) and residual standard error (RSE), as well as values not significant for p-value (t-test paired with a = 0.01).The estimations obtained by the equations of Návar-Cháidez (2009a) and Sampaio and Silva (2005) did not show a significant difference when compared to the observed values of biomass for all species, in addition the validation of these equations explains more than 86% of the data.
Within species, the differences between the values observed and estimated by the specific equations were extremely high for P. bracteosa and M. ophthalmocentra (Bias (%) = -37.7 and -55.7, respectively), evidencing overestimates by these equations.In this case, the values of AIC and RSE are lower for the general equation of Sampaio and Silva (2005), in which the paired t-test indicates that the compared biomass values are similar from the statistical point of view.
Although for M. tenuiflora, A. pyrifolium and species of the Croton genus all validated equations were significant by the paired t-test, the highest AIC and RSE values are still shown by the specific equations mainly for M. tenuiflora and A. pyrifolium, although this does not invalidate the use of these equations, since the values of Bias (%) were inferior to those of the general equation of Sampaio and Silva (2005).For M. tenuiflora, the equations of both Serra Talhada and Sertânia cities (Sampaio et al. 2010), present similar values of linear and angular coefficients, indicating that the dendrometric characteristics of this species do not differ of the species of this study at local scale.For A. pyrifolium, lower trend was found in the specific equation of Sampaio and Silva (2005), (Bias (%) = 16.8),being slightly inferior than that obtained by the general equation of Sampaio and Silva (2005).For the Croton genus the specific equation of Sampaio et al. (2010) shows a lower tendency of overestimation (Bias (%) = -9.6).However, the values of AIC and RSE do not invalidate the use of the general equation of Sampaio and Silva (2005) for these species.
In relation to the biomass distribution by diameter class (Figure 4), it can be seen from the confidence interval that the general equation of Sampaio and Silva (2005) was more efficient in most cases.Considering all species, from the class of 18.5 cm in diameter, it was noticed a difference between the values of biomass observed and estimated by the equations.This result is due to the presence of discrepant data (outliers), where the biomass variability is higher than in the previous diameter classes.The validated equations were statistically satisfactory up to the diameter of 15.5 cm where the highest homogeneity of the data occurs.The exception was the Abreu et al. (2016) equation, which showed a difference in all classes for all analyzed cases.
For P. bracteosa and M. ophthalmocentra, in all diameter classes the general equation of Sampaio and Silva (2005) generalized the estimates more efficiently, except in the latter class for both species.The specific equations overestimated in all classes, corroborating with the validation results found, presenting statistical differences by the confidence intervals.
For M. tenuiflora, the efficiency of the general equation of Sampaio and Silva (2005)

DISCUSSION
In order to reach accurate estimates of plant biomass stock, biomass validated equations to local, regional and global circumstances are mandatory (IPCC 2006, Sato et al. 2015).Although some studies estimate aboveground biomass locally for the Floresta city (Abreu et al. 2016), the validated equations of other sites generalize with lower bias biomass values.For example, the Návar-Cháidez (2009a) equation explains ≥ 79% of the variation in aboveground biomass in a dry forest eco-region in Mexico, in the validation with dry forest biomass data in Brazil, the explained variation is higher than 90%.The general equation of Sampaio and Silva (2005) explains ≥ 94% for dry forests of Pernambuco and Bahia, for the biomass variation of this study, the equation explains 86% of the data variation.This total explained suggests that the ecological patterns of growth, development and establishment of the species are similar on a regional scale (Ceccon et al. 2006, Chidumayo and  Gumbo 2010).Thus, both the equation developed in Mexico and that developed in Brazil using local data sets may reduce the uncertainty in biomass estimation in the Floresta city.
These results confirm that the aboveground biomass of a tree can be obtained considering both the diameter and the product of the diameter with the height.In addition, the estimates of the parameters of local models by Sampaio and Silva (2005) for all species were not significantly different from the estimates of the parameters of the equations of Mexico.Models that include tree height improve biomass estimation in many tropical forests (Chave et al. 2005, Rutishauser et al. 2013).
Individually, for the species P. bracteosa and M. ophthalmocentra these affirmations are corroborated, since better biomass estimates are obtained by the generic equation of Sampaio and Silva (2005), even considering the diametric distribution.The results of this work may suggest two considerations: (1) indicate that these species tend to present differences in ecological patterns of development, although the studied areas are from the same region but from different sites; or (2) the models were adjusted to estimate the biomass in trees with different dendrometric amplitudes (van Breugel et al. 2011), so the resulting equations tend to overestimate the biomass for these species in this study when considering only the diameter variable.
Although the height is an important variable in estimating the biomass for the studied dry forest, some studies show that estimates at individual or multi-species scales are more efficient when considering the basic density of wood (Deans et al. 1996, Baker et al. 2004, Chave et al. 2005, Vieilledent et al. 2012).This suggests a necessity for a revision of the IPCC guidelines (Aalde et al. 2006), since these guidelines recommend that allometric equations only depend of the tree diameter (Fayolle et al. 2013).
In regard to biomass estimates for M. tenuiflora, A. pyrifolium and Croton genus, specific regional equations are best indicated for the statistical validation test.This shows a similarity pattern between the biomass stocks of these species and the Croton genus on a regional scale.Although these results are contrary to those obtained by equations validated for all species together with P. bracteosa and M. ophthalmocentra, only the diameter variable is sufficient to estimate aboveground biomass, even at the level of diametric distribution.
There is a gap to be filled about the validity of equations globally (pan-tropical) validated across the tropics for dry forests, although some significant biases are reported by Kale (2004), Brandeis et al. (2006), Návar-Cháidez (2014), Sato et al. (2015) and Memiaghe et al. (2016).In tropical rainforests, the development of global models already provide significant evidence for estimates and global validations of biomass and carbon stock above the ground (Chave et al. 2005), although other studies suggest the correction of errors and the addition of variables in the models through different databases (Djomo et al. 2010, Henry et al. 2010, van Breugel et al. 2011, Vieilledent et al. 2012, Alvarez et al. 2012, Nogueira Lima et al. 2012).
The issue currently under discussion that is reported in this paper is if elsewhere in the tropics with dry forest dominance, where no equation of biomass of species or specific area is available, it would be better to use the generic equations or to develop local equations.Although the choice of the equation is an important source of uncertainty in biomass estimates (Chave et al. 2004, Fayolle et al. 2013), there is a lack of clear guidelines for the selection of existing models.
On the one hand, Basuki et al. (2009) discuss the applicability of generic equations to the diverse structure and composition of tropical forests.Gibbs et al. (2007) in their review of methods to estimate the biomass of tropical forests argued that the effort required to develop biomass equations of specific species or sites (hotspots) would not normally improve the accuracy of biomass estimates.In this study, it can be observed that the specific equations for some species are not necessarily better than the generic equation (multi-species), which includes the total height of the tree as a predictive variable.
Contrary to the results of Basuki et al. (2009), the results of this work suggest the use of generic or site-specific equations with similar characteristics instead of the adjustment of models even considering only at the genus level (Croton).This recommendation is indicated due to the fact that the similarities of sites in tree biomass estimates are almost entirely driven by similarity in height and diameter patterns as observed in Sampaio and Silva (2005), in this way, not only the diameter, but the tree height is an important factor that needs to be considered in order to improve forest biomass estimates (Feldpausch et al. 2011).Thus, it can be discarded the possibility of thinning trees to compose a significant sample to fit biomass generic and specific models.
The validation of generic equations, however, should be tested under particular environmental conditions, for example in dry forests in water stress situations or at different precipitation scales, which may restrict the allometric relationship between height and diameter (Nath et al. 2006).
In addition, in order to avoid extrapolations above or below the confidence intervals, it should be also considered a compatible dendrometric amplitude of the data.A possible alternative to integrate biomass estimates based on forest inventory measurements would be the use of satellite images or even LiDAR technology (Estornell et al. 2011, 2012, Almeida et al. 2014).Biomass models considering genus, families, successional groups, climatic variables and specific density of wood should be adjusted, tested at both local and regional levels, as well as on tropics scales with dry forest dominance.

A
wide variety of methods have been proposed and used in many different fields of study.In many cases, the choice of technique is limited by the potential uses and tests of the model, the types of data that the equation generates, or the availability of actual data.Validation techniques can be grouped into four main categories, namely: subjective assessment, visual techniques (graphics), diversion measures and statistical tests (Mayer and Butler 1993).Despite the interest for biomass accounting in the region, few studies compare or validate biomass equations in Brazilian dry forests (Pereira Junior et al. 2016).Our research is divided into two parts.First, it was performed a statistical validation and comparison of global and/or local equations available for an initial destructive sample of 507 trees.Second, it was investigated how these equations generalize estimates of aboveground biomass stock in different species.The objective of this work was to provide predictions of aboveground biomass for a dry forest located in Pernambuco, Brazil, applying generic equations.Categorically, this work addresses: (1) whether site-specific equations for species and multi-species predict biomass better than generic equations; (2) if the generic equations of dry forests not located in Brazil generate reliable estimates for biomass in our sites.
.94 †: inventory data from two different sites, one in Pernambuco and the other in Bahia; i = biomass (kg) aboveground; D b = base diameter in centimeters (0.30 m above ground level); DAP = Diameter at breast height in centimeters (1.30 m above ground level); H = Height; β i = parameters of the models; b i = coefficients obtained after the model adjustments; and R 2 = coefficient of determination.ROBSON B. DE LIMA et al.

Figure 2 -
Figure 2 -Box-plot for the base diameter and the biomass above ground.The boxes represent the 25th and 75th percentiles; the dashed margins represent the 10th and 90th percentiles.The line represents the median and the points indicate the presence of discrepant data (outliers).
in the estimates is highlighted, because all diameter classes of the species do not show differences by the confidence interval.The specific equations presented good performance except the Sampaio et al. (2010) -Serra Talhada equation for the class of 13.05 cm and Sampaio et al. (2010) -Sertânia equation for the class 16.05 cm. A. pyrifolium and Croton genus species report good estimates for the validated equations up to the third diameter class, the other classes indicate greater variability resulting in differences by the confidence interval.

Figure 4 -
Figure 4 -Comparison of biomass stocks observed and validated by generic and specific equations by diameter class.The bars indicate the confidence interval (mean ± standard error) at 95% of probability.

TABLE II Errors, trends, qualities and paired t-test for aboveground biomass estimates of validated equations for multi-species and individual species data.
2 : is the coefficient of determination; Df: is the degree of freedom; RSE: is the standard error of the estimate; AIC: is Akaike's Information Criterion; Bias (%): is the relative trend; Statistic and p-value: are paired t-test results between observed and estimated biomass values.