SPECIES-SPECIFIC EQUATIONS: GREATER PRECISION IN COMMERCIAL VOLUME ESTIMATION IN MANAGED FORESTS IN THE AMAZON

The objective of this study was to analyze the performance of species-specific equations (SSEs) concerning generic ones in Annual Production Units (GEAPUs) and in a Forest Management Area (GEFMA) in the Brazilian Amazon. A total of 29,119 trees from 43 species were inventoried, harvested, and volumetric measurements were taken in ten APUs, with 10% of this total being separated for validation and comparison of the selected equations. After selection and validation of the equations (GEFMA, GEAPUs and SSEs) they were compared using precision statistics, by contrasting estimated and observed volumes and by residual analysis. Precision statistics were clearly lower for the SSEs. Trend lines near the average observed volume were shown for the SSEs when the estimates were contrasted with the observations. The residuals generated by the SSEs were smaller and statistically different than those of GEFMA and GEAPUs for the majority of cases. The most important commercial species (M. huberi) had its volume overestimated by 10.6, 9.3 and 3.0% when the GEFMA, the GEAPUs, and the SSEs were applied, respectively. Among the species that generally had very large trees, H. petraeum had its volume underestimated by 15.7, 16.6 and 4.4% by the GEFMA, GEAPUs and SSEs, respectively. The greater precision of the SSEs is reflected in better forest management planning decisions with respect to operational and economic aspects. These results show that besides being statistically valid, the SSEs are recommended for obtaining more precise estimates of commercial volume, especially since there is a great demand for reliable estimates for each individual species in forest management areas in the Amazon. 1University of the Midwest of Parana, Irati, Parana State, Brazil, ORCID: 0000-0002-6388-7679a, 0000-0001-99657851b, 0000-0003-4025-9562c, 0000-0002-1685-7864d 2University of Western Para, Santarem, Para, Brazil, 0000-0002-3629-3437a SPECIES-SPECIFIC EQUATIONS: GREATER PRECISION IN COMMERCIAL VOLUME ESTIMATION IN MANAGED FORESTS


INTRODUCTION
Forest management requires criterious planning to be conducted and estimation of commercial volumetric production is crucial to this process (Ribeiro et al., 2014;Tonini and Borges, 2015). In the Brazilian Amazon, the need for reliable estimates is gaining in importance because the estimated volumetric stock is one of the principal pieces of information required by public agencies responsible for forest management in order to emit annually the Logging Authorization (AUTEX) certificate to forest management companies operating in native forests (Brasil, 2009).
Estimates are primarily made using volume equations, and these can be generic or specific. A generic volumetric equation is constructed using data collected from various tree species, while a species-specific equation refers to an allometric equation developed from observations collected from a single tree species (Kora et al., 2018).
In the Brazilian Amazon generic equations developed based on an entire Forest Management Area (FMA) are commonly used, such as those developed by Rolim et al. (2006), Colpini et al. (2009), Thaines et al. (2010), Barreto et al. (2014), Silva and Santana (2014), Gimenez et al. (2015) and Tonini and Borges (2015). In the Tapajos National Forest (TNF), initiatives promoted by a forest management company have enabled the use of generic equations for Annual Production Units (APUs), and these equations are therefore restricted to a specific area. Although these equations have been compared to a generic equation used in the TNF (Gomes et al., 2018), there is still a need for comparisons with equations that are specific for species.
Gains in precision have been observed from the use of equations developed for smaller areas, possibly as a function of a reduction in environmental variation (Mauya et al., 2014;Vibrans et al., 2015;Kachamba and Eid, 2016;Gomes et al., 2018). However, in natural tropical forests, the great heterogeneity in species' composition and structure, even in small areas, represents an important challenge for the development of volumetric equations (Akindele and LeMay, 2006;Soares et al., 2011). According to the authors cited in the previous sentence, data stratification by species represents one of the principal alternatives to obtain more precise volume estimates.
It is important to consider that according to the Logging Authorization (AUTEX), the maximum volume authorized for harvesting is specific for species (Brasil, 2009). Variation in these volumes depends on, among other factors, the estimated production presented by the forest management company to the environmental regulation agency. In this way, precise estimates must be conducted for each individual commercial species. The more accurate the estimates for each species, the less will be the discrepancy between authorized volume and that of the harvest.
Although some studies have developed equations specific for some Amazonian commercial species (Lima et al., 2014;Ribeiro et al., 2014;Cysneiros et al., 2017;Santos et al., 2019), for the majority of species that are currently managed, specific equations have not been tested. Furthermore, in this region species-specific equations have rarely been compared to generic ones, as has been done in other regions and continents (Guendehou et al., 2012;Vibrans et al., 2015;Goussanou et al., 2016;Kora et al., 2018). Therefore, besides developing speciesspecific equations that are appropriate for Amazonian species, it is necessary to evaluate their performance in relation to generic equations.
In this context, objective of this study was to analyze the performance of species-specific commercial volume equations in relation to generic ones in a FMA and by APUs in a managed forest in the TNF in the eastern Brazilian Amazon. The hypothesis tested was that species-specific equations are more precise, and therefore are more appropriate for use in the Amazon region than generic ones.

Study area
This study was conducted in the Tapajos National Forest (TNF), which is a federal Conservation Unit (CU) located in the western region of the State of Pará, along the Santarem-Cuiabá (BR163) highway, and is part of the municipalities of Belterra, Aveiro, Placas and Ruropolis, with geographic coordinates 2º 45 to 4º 10´ S and 54º 45´ to 55º 30´W (Figure 1). The CU occupies an area of approximately 544,927 ha, of which about 32,000 ha are reserved for a community forest management concession (Forest Management Area -FMA). The vegetation in the CU is classified as Ombrophilous Dense Forest and is characterized by the dominance of large individual trees, palms and epiphytes, with a uniform canopy or with emergent trees (Gonçalves and Santos, 2008).

Data collection
The data used in this study were from 29,119 trees (50.0 cm ≤ DBH ≤ 175.0 cm) of 43 commercial species from 10 APUs (03 to 12) managed from 2008 to 2017 in the FMA of the TNF (Figure 1). The area of the APUs varied between 521 to 1,723 ha, totaling approximately 11,136 managed hectares. The data were collected through 100% forest inventories (census of all trees with DBH ≥ 50 cm) and rigorous volumetric measurements of logs.
In the 100% forest inventories, besides species identification by common regional name, they were obtained diameter at 1.3 m above the soil surface (DBH) and visually estimated commercial height (h c ) for commercial trees. Volume was obtained through rigorous volumetric measurements using the Smalian method. Initially, volumes of individual logs were taken so that the sum of these, discounting the volume of hollows (when present) composed the commercial volume (v c ) of the stem.
Besides DBH, h c (visually estimated during the inventory) was used for modeling of v c , although h c was also measured through the sum of the logs during the rigorous volumetric measurements. This procedure was justified by the fact that the sum of the logs was significant different than h c visually estimated during the inventory (Gomes et al., 2018). Since h c estimated in the inventory is the measurement used as input for the selected volumetric equation, this measurement was also chosen for the adjustment of the volumetric models.

Data organization
The dataset was separated into three categories to obtain three different types of equations, which differed in their scope: (1) for the FMA, independent of the APUs and species (generic equation for the FMA -GEFMA); (2) by APU, independent of the species (generic equations for the APUs -GEAPUs); and (3) by species (species-specific equations -SSEs). To obtain the GEFMA, all trees were used, which therefore involved all the APUs and all species. To obtain the GEAPUs and SSEs, the data were stratified by APU and by species, respectively.
About 10% of the sample trees were previously selected to compose the validation dataset, and for this reason, these trees were not included in equation adjustment. This selection was done randomly, but proportional to the number of trees in each diameter class (DBH). The selection was done for each species, and for validation of the generic equations the data were compiled into their respective categories (FMA and APUs).
The list of species and the number of trees in each dataset (adjustment and validation) are shown in Table 1.

SANTOS, et al
The data distribution for the adjustment and validation, using the relationship between v c and DBH, are presented in Figure 2 for the 36 species with n> 30, and in Figure 3 for the group "Others" (species with n< 30).

Selection and validation of the equations
Among the volumetric models commonly used for tropical forests (Guendehou et al., 2012;Vibrans et al., 2015;Cysneiros et al., 2017;Gimenez et al., 2017;Tsega et al., 2018), four were tested for equation selection in the three categories (GEFMA, GEAPUs and SSEs). Two single input and two double input Model 1 (Kopezky & Gehrhardt), Model 2 (Husch), Model 3 (Spurr), Model 4 (Schumacher & Hall) were used, where: v c = commercial volume, in m³; DBH = diameter at breast height (measured at 1.30 m above the soil), in cm; h c = commercial height, in m; b 0 , b 1 and b 2 = regression coefficients to be estimated; ln = neperiano logarithm; and ε i = random error. FIGURE 2 Data distribution for the adjustment and validation for the 36 species with n> 30. SANTOS, et al to the validation dataset separately for each species. The precision of the estimates in relation to the observed volumes was measured using the following percentage statistics (Campos and Leite, 2017): Mean Squared Root Error (MSRE%) (Equation 5), Bias (B%) (Equation 6), and the Average Absolute Difference (AAD%) (Equation 7). For which, when the values are nearer to zero the better the performance of the equation, where: y i = observed commercial volume of the i th tree, in m³; y i = commercial volume estimated of the i th tree, in m³; =average observed commercial volume of the sample trees, in m³; n = number of observations.

FIGURE 3 Data distribution for the adjustment and validation
for the group "Others" (set of 7 species with n < 30).
The coefficients of the volumetric models were estimated by the least squares method (LSM), using the 'lm' function of the software R version 3.6.0 (R Core Team, 2019). The significance of the coefficients was evaluated using a t-test (α = 0.05) in the regression.
For evaluation of the adjustments and selection of the best equations, the adjusted coefficient of determination (R² aj ) was used, which expresses the quantity of the total variation that is explained by the regression (Campos and Leite, 2017); the standard error of the absolute estimate (S yx m³) and in percentage (S yx %), which indicates the quality of the adjustment and how much the models errs on average when estimating the dependent variable (Machado et al., 2008) and the graphic dispersion of percent residuals (Res.% = ((v observed -v estimated )/v observed )100) were used to reveal possible biases in the estimates (Campos and Leite, 2017). The R² aj as well as the S yx were recalculated for arithmetic units in the case of logarithmic models.
The four models were tested separately for the FMA, for each of the APUs, and for each of the 36 species, as well as for the group "Others". After the evaluation of the adjustments, 48 equations were selected and compared.
Before comparing the selected equations in the three categories they were submitted to a process of statistical validation. In this process, after its use in the estimation of volumes in the validation dataset, a paired t-test (α = 0.05) was used to compare estimated volumes with their respective observed values. When logarithmic equations were used the estimates were done using the original volume scale. The null hypothesis tested was that the estimated and observed volumes were statistically equal.

Comparison of the selected equations
To compare the selected equations in the three categories (GEFMA, GEAPUs and SSEs), they were applied Furthermore, direct comparisons for equations were done for (1) the entire validation dataset independent of APU and species; (2) the ten species of greatest commercial importance (largest volumes harvested) (A. lecointei, Couratari sp., H. impetiginosus, H. courbaril, H. parvifolia, L. lurida, M. huberi, M. itauba, P. bilocularis and V. maxima); and (3) for five species that commonly have high errors in estimates, especially due to large variability the data (C. odorata, C. cateniformis, H. petraeum, P. suaveolens and Terminalia sp.).
For the direct comparisons, the volumes estimated by the selected GEFMA, GEAPUs and SSEs were contrasted with the observed volumes. A trend line for the estimates was generated using the linear relationship between the observed and estimated volumes. Additionally, the residuals of the estimates were graphically analyzed and submitted to Analysis of Variance (α = 0.05). Subsequently, the SNK test was applied (Student-Newman-Keuls) for comparison of means. All analyses were performed using R software (R Core Team, 2019).

Selection and validation of the equations
Since a large number of equations was generated, and the principal objective of this study is to compare the best equations, only the equations selected in each of the three categories are shown in Table 2. The complete [5] [6] [7] list of tested equations, with their respective adjustment and precision statistics, are found in the supplementary information (Appendix A). Graphical distributions of the residuals of the selected equations are also presented in a supplementary document (Appendix B).
All the selected equations have just DBH as an independent variable ( Table 2). The adjustment and precision statistics, as well as the graphical analysis of the percent residuals indicate similarity in the performance of these equations in relation to the equations that included DBH and h c . This result is possibly a result of the fact that commercial height had non-sampling error in its measurement due to the difficulty in accurately measuring this variable, which compromises the precision of the volumetric estimates.
The generic equations (GEFMA and GEAPUs) had values of R 2 aj varying between 0.64 and 0.76, and S yx (%) varying between 29.2 and 33.8%. In general, there were improvements in the statistics for the SSEs, except for a few species. Among the species-specific equations, the equation for the group "Others" had the highest error, and this reflects the large variability of the data grouped for seven species. Without considering the equation from the "Others" group, the R 2 aj varied from 0.30 to 0.79 between species, while the S yx (%) varied between 18.1 and 36.4% (Table 2). The estimated coefficients of all the selected equations were significant (p ≤ 0.05), according to the t-test from the regression.
All the selected equations were submitted to the validation test. The paired t-test (α ≤ 0.05), showed that the equation selected for the FMA was not statistically valid (p=0.0019) for the estimates of species' commercial volumes. With respect to the generic equations by APU, only for APUs 03 and 04 were the observed and estimated volumes not statistically equal (p ≤ 0.05). In contrast, all the species-specific equations were valid for the volumetric estimates (p ≥ 0.05), and therefore considered adequate for use. Despite this result in relation to the generic equations, the comparison of the selected equations in the three different categories of data was conducted. Table 3 shows the statistics for the analysis of estimate precision calculated by species, and consequently serves to compare the performance of selected equations in the three different categories.

Comparison of the selected equations
The species-specific equations (SSEs) show much lower values of MSRE (%), B (%) and AAD (%). This can also be seen in a more summarized manner by observing the weighted averages of the statistics at the end of Table  3. Furthermore, B (%) had large variation in the tendency to under-and overestimate values between species. The values of B (%) varied between -101.9 and 26.4%; -84.2 to 21.2% and -13.7 and 9.9% for GEFMA, GEAPUs and SSEs, respectively. From this perspective it can be concluded that, besides being statistically valid, the species-specific equations can generate more precise estimates when they are used for trees that were not part of the adjustment procedure, which is the principal objective of volumetric modeling.
In the direct comparison of the equations through contrast of the estimates in relation to the values observed for the entire validation dataset the trend lines were near the average (Figure 4 -Left). This occurred due to compensation by over-and underestimation since the number of trees was large. Although the three types of equations presented a trend to overestimate volumes (Figure 4 -Center), the ANOVA showed a significant difference between the residuals (Figure 4 -Right). Consequently, the means test revealed that the residuals generated by the SSEs were, on average, smaller and different than those generated by the other equations (GEFMA and GEAPUs).   In the comparison of the equations for the ten most important commercial species, the contrast of the estimated and observed values showed that the SSEs generated more precise estimates, with trend lines nearer to the mean for most species (Figure 5 -Left). The analysis of the distribution of the residuals revealed gains in estimate precision by the SSEs for most species (Figure 5 -Center), although these were slight.
The superiority of the SSEs becomes evident when comparing the residual means. The ANOVA showed a significant difference (p < 0.05) for all ten species except H. impetiginosus and H. parvifolia ( Figure  5 -Right). The means comparison test showed that the residuals generated from the estimates made by the SSE were, on average, different and nearer to zero than the residuals generated by the GEFMA and GEAPUs.  Consequently, the trends of over-and underestimation were reduced when the SSEs were used. The species M. huberi had the largest volume harvested in the ten UPAs (approximately 24% of the total volume), and had its total volume overestimated by the validation dataset by 10.6, 9.3 and 3.0% when the GEFMA, the GEAPUs and the SSE were used, respectively. Couratari sp., in contrast, with the second largest volume harvested, had its volume underestimated by 10.4 and 10.1% when the GEFMA and the GEAPUs were used, respectively; however, when the SSE was used for this species the volume was underestimated by just 0.8%. Table 2, even the best equations developed for C. odorata, C. cateniformis, H. petraeum, P. suaveolens and Terminalia sp. had high estimate errors (S yx > 30%). This is a function of the large variability in the data for this species, especially for large trees, which complicates model adjustment. In the comparison of the equations for these five species there are relevant gains in precision when the SSEs are used.

As shown in
When contrasting the generated estimates with the observed values, the trend lines for the estimates made by the SSEs were nearer to the mean (Figure 6 -Left). The distribution of the residuals showed a reduction in tendencies to under-and overestimate volumes using the CERNE SANTOS, et al

SANTOS, et al
SSEs (Figure 6 -Center). The ANOVA indicated a significant difference between the residuals generated by the three types of equations, except for the species C. cateniformis (Figure 6 -Right). The means comparison test showed that the SSEs were different than the generic equations (GEFMA and GEAPUs), with relevant gains in precision (Figure 6 -Right). Despite the non-significant ANOVA for the residuals of C. cateniformis, there was greater precision of the SSE, for the comparison of estimated volumes as well as for the evaluation of the residuals.
H. petraeum, one of the commercial species with the largest trees (Mean DBH = 109 cm), commonly has its volume underestimated. However, when the SSE is used for this species the underestimation was just 4.4%, while for the GEFMA and the GEAPUs the underestimation was 15.7 and 16.6%, respectively. In a similar manner C. cateniformis (Mean DBH= 105 cm) had its volume underestimated by 9.4 and 10.7% by the GEFMA and GEAPUs, respectively, while the SSE overestimated the volume of this species by 0.3%.

DISCUSSION
For the selection of the three types of equations (GEFMA, GEAPUs and SSEs), the single-input equations were chosen since these had similar performance to the double-input equations. Although not very common, single input generic equations have been tested and recommended for commercially managed species in diverse regions of the Amazon (Barros and Silva Júnior, 2009;Thaines et al., 2010;Barreto et al., 2014;Tonini and Borges, 2015;Gimenez et al., 2015;Gimenez et al., 2017). Results from these studies have shown that the use of single input generic equations reduced inventory time and cost, and one of the principal advantages is that intrinsic non-sampling error of the visual estimates of h c is eliminated.
In tropical forest ecosystems it is often difficult to measure commercial tree height with precision due to the presence of various strata and a closed canopy. Models that use only DBH as an explanatory variable are therefore useful in this case and have shown good results (Segura and Kanninen, 2005;Goussanou et al., 2016;Gimenez et al., 2017;Kora et al., 2018). When it is possible to rigorously measure commercial tree height in forest inventories, the use of double input equations should be prioritized.
It should be emphasized that the large errors in the estimates, principally for the selected generic equations, is due, in large part, to heterogeneity in dendrometric variables of the species. These results were also reported by other studies conducted in the Amazon (Hiramatsu, 2008;Cysneiros, 2016). As in the studies by these authors, normally high S yx and low R² aj are linked to the use of a large number of sample trees.
The SSEs for certain species such as C. odorata, C. cateniformis, Couratari sp., H. petraeum, O. costulata, P. suaveolens, Terminalia sp., V. maxima, and the "Others" group also had high errors (S yx > 30%), similar to those for the generic equations. This is probably due to the fact that these species have large structural variability, a common characteristic for large trees and where the largest errors in volumetric modeling occurs for tropical species (Brandeis et al., 2006).
The validation of the best equations indicated that, although the generic equations had satisfactory performance with respect to the adjustment and precision statistics and residual evaluation, would be inadequate for use in a new dataset because they can produce biased estimates. The species-specific equations, however, have the advantage of being validated by the t-test, indicating that their estimated volumes were statistically equal to the observed volumes.
Comparing the generic and species-specific equations through precision statistics revealed relevant variation in the generic equations in the tendencies for under-and overestimation between species, indicated by the values of B (%). This is a characteristic of generic equations when they are used to estimate the volume of individual species (Akindele and Lemay, 2006). This could be problematic for forest management planning since each species has specific restrictions, such as authorized harvest volume for each species. Furthermore, when species estimates are imprecise, the prediction for total production of a forest is incorrect.
The direct comparisons through contrasts of estimated and observed volumes for the entire dataset indicated that the errors in under-and overestimation tended to compensate each other due to the large quantity of data. This suggests that the generic equations are as efficient as the specific ones at making volume estimates for a set of data without stratification by APU or by species. However, as previously stated, there is a necessity to generate species-based estimates for forest management in the Amazon, which increases the importance of using of species-specific equations. Furthermore, the evaluation of the residuals through ANOVA and the means comparison indicated lower error using from the SSEs.
In this study, the precision statistics also demonstrated a gradual reduction in the estimated errors due to the stratification of data by APU and, principally, by species. The greater precision of the SSEs compared to the GEFMA and the GEAPUs, indicates that the results of individual evaluation using precision statistics are confirmed by direct comparison of equations. Species of greater commercial importance and those with large errors in their volume estimates were measured with greater precision by the SSEs, which could possibly occur for the remainder of the species in this study.
Species that have a high market demand, and that consequently have greater harvest volumes, need precise equations since systematic errors in estimates for these species represents under-or overestimation of a large quantity of cubic meters of commercial volume. Furthermore, for species that commonly have very largesized individuals (average DBH > 100 cm), volumetric modeling is particularly challenging (Brandeis et al., 2006;Cysneiros, 2016) and generic equations normally tend to under-or overestimate volumes. However, the results of the current study also show important gains in precision can be achieved for these species by using SSEs. In forest management areas, this is particularly important because exceptionally large trees obviously represent a large portion of total commercial volume.
The results confirm the importance of a reduction in variation of data to obtain more efficient equations, as emphasized by Finger (2006). The principal advantage of stratification of data by species was an increase in the correlation between v c and DBH, which undoubtedly contributed to a better adjustment of the models to the data. In certain cases, dataset stratification by species may be inviable from the point of view of sample representativity. However, this was not a problem in this study since after stratification most species remained well-represented with sample trees from across the range of species diameter.
Despite the greater facility in obtaining and using generic equations, they should be used in commercial tropical forests with caution. In the Amazon, besides their use being predominant, during many years an equation dependent on a form factor of 0.7, recommended by Heinsdijk and Bastos (1963), and generic equations adjusted using data from the TNF have been used in a generalized manner in a diversity of sites.
During recent years, equations have been adjusted for specific management areas in the Amazon (Rolim et al., 2006;Barros and Silva Júnior, 2009;Thaines et al., 2010;Barreto et al., 2014;Silva and Santana, 2014;Gimenez et al., 2015). However, these are equations used, generally, in a generalized manner for all species present in all production units that are managed annually. Furthermore, in the majority of cases, these are equations that were developed using a small number of trees, measured in specific places in management areas, thus reducing their representativity.
In the FMA in the TNF, Gomes et al. (2018) reported that an equation that was adjusted specifically for an APU was more precise than a general equation for the TNF and an equation related to the average form factor. Similar results were found by other studies when equations were developed for smaller areas (Mauya et al. 2014, Vibrans et al. 2015Kachamba and Eid, 2016). These results indicate that there is a gain in precision when data stratification in relation to area is done, possibly due to a reduction in edaphoclimatic variation. Even though there is a gain in precision, the heterogeneity between species can still be a limiting factor for generation of adequate equations, and therefore should be taken into consideration. Kora et al. (2018) compared species-specific volume equations with a generic equation in Benin in west Africa and found greater precision for the specific equations. The authors related that the generic equation had difficulty in estimating volume with precision, even though it was developed using data from the same forest ecosystem (the same edaphoclimatic conditions). Similar results were reported by Guendehou et al. (2012) and Goussanou et al. (2016) in the same region.
In the Brazilian Amazon, Cysneiros et al. (2017) tested generic equations for 32 commercial species, and species-specific equations for 12 principal species. Although direct comparisons of the estimates made by the selected equations were not done, the authors found better performance of species-specific equations through adjustment and precision statistics.
In the state of Amazonas, Krainovic et al. (2017) compared an equation specific for Aniba rosaeodora Ducke with a general equation based on the average form factor, which is commonly used in the Brazilian Amazon. These authors found that the general equation overestimated observed volumes by 32.8%, while the specific equation overestimated volume by just 0.15%. The use of a general equation with a single form factor for all situations could explain the elevated error generated by this equation.
Various factors can explain the difficulty of generic models in providing precise estimates, such as biophysiological properties of species and edaphoclimatic conditions (Goussanou et al., 2016). The inter-species variation in form factors of tropical tree stems (Larson, 1963;Silva et al., 1994) can make the generation of efficient generic equations difficult. Therefore,