Robust volumetric models for supporting the management of secondary forest stands in the Southern Brazilian Atlantic Forest

The majority of Atlantic Forest fragments in Southern Brazil are second-growth forests dominated by fast-growing species with considerable market-value timber. Nevertheless, volume prediction models are scarce, especially to estimate tree total volume (i.e., stem plus branches). This study approached the issue through the following aims: to fit and select stem and total volume models (generic and species-specific) using data from 288 harvested trees in a management operation, and to fit generic and species-specific bark factors. The power model embedding diameter at breast height (D) and tree stem or total height (H) presented the greatest prediction strength for both stem and total tree volume. Models including only D to predict total tree volume were similar to double-entry models regarding goodness-of-fit. Therefore, they may be useful in the context of subtropical closed-canopy forests, where the difficulty and uncertainty in H measurements are not trivial. Species-specific models fitted for Miconia cinnamomifolia (DC) Naudin. and Hyeronima alchorneoides Allemão surpassed generic models only for the former species. Nevertheless, the prediction improvement should offset the eventual extra efforts implied in the collection of reliable samples of these species. Finally, bark factors stood as a satisfactory tool for inside bark mean volume estimation.


INTRODUCTION
Among all Brazilian vegetation domains, the Atlantic Forest suffers the greatest impact from human activities driving forest fragmentation and degradation (Ribeiro et al. 2009).The fragments' extent varies widely within the states where the Atlantic Forest is found.In this regard, Santa Catarina, in Southern Brazil, is privilegedapproximately 29% of its territory is covered by native forests (Vibrans et al. 2013), which in turn is divided into three main forest types/subdomains: evergreen rainforest (ERF), Araucaria forest, and seasonal deciduous forest (Klein 1978).In this state, forests are usually very fragmented, with 80% smaller than 50 ha.The ERF covers 40.5% LAIO Z. OLIVEIRA et al. of its 29,282 km² original extension, the largest in proportion among the three forest types (Vibrans et al. 2013).Notwithstanding, most of these remnants (~95%) are composed by second-growth forests, where pioneer and early secondary species dominate the successional process (Siminski andFantini 2004, Schorn andGalvão 2006).
In this context, fast-growing species, such as Hyeronima alchorneoides Allemão, Miconia cinnamomifolia (DC) Naudin., Nectandra spp., and Ocotea spp., produce good quality timber, making secondary forest management attractive for land owners (Fantini et al. 2016, Siminski et al. 2016).Moreover, several authors agree on the potential benefits of pure or mixed plantations of suchlike species (Klein 1980, Schuch et al. 2008, Coradin et al. 2011).Both approaches would indeed reconcile the misunderstood mutually exclusive conflicting purposes of conservation, on the one hand, and use of forest resources in the region, on the other (Montagnini and Jordan 2005).Tailored planning of such a win-win solution would require a set of tools, including timber volume, biomass, and carbon stock estimation models.
Wood volume is one of the most essential pieces of information in the development of sustainable management programs.In forestry, regression models fitted to field data regarding measured tree volume and predictor variables such as diameter at breast height (D) and tree or stem height (H) prove to be feasible tools to estimate individual tree volume.Regression models may synthesize the relationship among variables and are hence adopted to predict a variable of interest on the basis of other subjects to an easier collection (Picard et al. 2012).Elsewhere, volumetric models have been developed for several purposes in Brazil, ranging from local to broader usages, fitted for a single species or for a group of species making up a particular forest type.Indeed, models were fitted for species with local economic interest like Mimosa scabrella Benth.(Machado et al. 2008) and Ocotea porosa Nees & Mart. (Santos et al. 2012), as well as for species that produce tannin, oil, and cork (Scolforo et al. 2008).For instance, Barreto et al. (2014) developed volumetric models for a community forest management project in the state of Pará using data from 132 trees of 23 commercial species.In turn, Vibrans et al. (2015) fitted generic and species-specific stem volume models for three forest types in Santa Catarina using data of 2,127 trees with diameters ranging from 10 to 76 cm.
Despite the availability of volumetric models in the literature, there is a lack of models to predict individual tree total volume (i.e., stem plus branches).The data for model fitting is usually obtained through destructive methods, i.e., tree harvesting (e.g., Scolforo et al. 2008).However, legal restrictions and operational constraints in protected native forests may forbid the harvesting of trees, thus impeding data collection.A usual workaround is sampling collection by means of tree climbing, yet a viable alternative to gather stem volume data (e.g., Vibrans et al. 2015).Partly, this could be the explanation for the scarcity of total tree volume models for the Atlantic Forest subdomains in Brazil.To account for this, in this study we built on the research carried out by a consortium of universities and the state environmental agency on a secondary forest management controlled pilot in Santa Catarina (Fantini et al. 2016).This trial compiled a unique data collection from a large sample of harvested trees of species occurring in secondary forest fragments.As mentioned above, this kind of data is not easily accessible for either physical or regulation limitations regarding forest inventory techniques.
Hence, the objectives of the present study were (i) to fit generic and species-specific stem and total (i.e., stem plus branches) volume models using a robust cross-validation statistical procedure; (ii) to compare the performance of generic and species-specific models in predicting the volume of abundant and economically relevant species, VOLUMETRIC MODELS FOR SECONDARY FOREST STANDS 3731 (2015), this forest has a density of 1,807.6 trees ha -1 (D ≥ 5 cm) and basal area of 32.3 m² ha -1 ; H. alchorneoides and M. cinnamomifolia are the dominant species, contributing to 23.2% and 6.5% of the basal area, respectively.
According to the Köppen classification, the study site is influenced by the Cfa climate typesubtropical mesothermal humid climate with a hot summer and without a dry season (Alvares et al. 2013).The rainfall is evenly distributed throughout the year, with an annual average of 1,900 mm; the average annual temperature ranges from 19.1 to 20.0 °C (Pandolfo et al. 2002).The study site altitude above sea level ranges from 205 to 430 m.An association of two soil types predominates in the study site: moderate A Tb haplic Cambisol and moderate A Litholic Neosol; both may present argillaceous texture (Embrapa 2004).
namely Miconia cinnamomifolia and Hyeronima alchorneoides; (iii) to further examine the statistical implications of distinct relationships between dendrometric variables in volume models; and (iv) to provide generic and species-specific bark factors to generate inside bark volume estimates.

STUDY AREA
The study site comprises 42 ha located in the state of Santa Catarina, Southern Brazil (26°32'01"S; 49°02'30"W), within the ERF region (Figure 1) (Klein 1978, Oliveira-Filho et al. 2015).The site is covered mainly by a 35-year-old second-growth forest, unmanaged for the last 30 years, resulting from succession after the abandonment of pastures and partially enriched with typical species of such forests in the region.According to Silva et al.

DATA COLLECTION
The total (i.e., stem plus branches) and stem volume (V) of 288 felled trees from 31 families and 65 species were determined by use of the Smalian's formula (Avery and Burkhart 2015).The diameter at 1.3 m from the ground (D) ranged from 5.4 to 56.0 cm; the total tree height (H t ) ranged from 2.5 to 33 m, and the stem height (H s ) ranged from 1.3 to 25.2 m.The stem diameter was measured at the heights of 0.1, 0.3, 0.7, 1.0, 1.3, 2.0, and 3.0 m and consecutively at every meter until the upper end, which was defined by a significant bifurcation.The diameter of each branch at every meter from its base up to the diameter top limit of 5 cm was also measured.Diameters were measured using a diameter tape or a caliper.When using a caliper, two cross-sectional measurements were recorded.The tree bark thickness was measured using a digital caliper at the same points where the diameter had been measured.The total and stem heights were measured using a tape.In total, the 288 trees added up to 168.3 m³ of inside bark volume, out of which 58.1 m³ (34.5%) corresponded to branches ≥ 5.0 cm.

FITTING AND SELECTING GENERIC MODELS
Ten generic models were fitted (Table I) to predict individual tree (i) outside bark stem volume (m³) and (ii) outside bark total volume (m³).Each model was fit to a randomly selected 70% subset (n = 202) from the entire dataset (n = 288).The remaining subset (n = 86) was used to assess model performance and, subsequently, to select the best models.Influential observations (i.e., outliers and leverage points) were investigated through the association of the Cook's distance-calculated for each observation in the fitting subsets-and the F distribution percentile with p and n -p degrees of freedom, where p is the number of regression parameters, and n is the number of observations in the fitting subset.Observations scoring F > 0.50 were regarded as influential and were removed (Neter et al. 1996).
After ordinary least squares model fitting, the uncertainty associated to the regression parameters estimation was assessed through the percent relative standard error (PRSE) (Eq. 1) (Sileshi 2014).The models' bias (Eq.2), adjusted coefficient of determination (R²), root mean squared error (RMSE, Eq. 3), mean absolute percentage error (MAPE, Eq. 4) as per Sileshi (2014), and the corrected Akaike's information criterion (AICc, Eq. 5) were calculated.The original scale of the response variable was used in all metrics.The R² may be denoted as pseudo-R² (R²*) when calculated for nonlinear models, as its underlying assumptions are not completely fulfilled (Anderson-Sprecher 1994).The procedure was repeated 1,000 times, from which the mean values of the regression parameters and their PRSE were determined, together with the mean values of the goodness-of-fit metrics (bias, RMSE, MAPE, AICc).In this cross-validation approach, the metrics may be noted as follows: where k θ ˆ= kth estimated regression parameter; SE = standard error; V i = observed volume of the ith tree; V ̂i = predicted volume for the ith tree; n = number of trees used in the model selection procedure; R = number of iterations, i.e., 1,000; p = number of regression parameters; and SSE = sum of squared errors of the regression.
The selection of the best models was supported by the Akaike weights (Wagenmakers andFarrell 2004, Sileshi 2014), denoted by w(AICc).Likewise, the PRSE of the best performance models were checked.Based on Sileshi (2014), a PRSE > 25% threshold was set to account for nontrivial uncertainty in the parameter estimation that should be evaluated1 .When a given parameter of the best models yielded PRSE > 25%, it was eventually dropped off in model fitting.Subsequently, the performance metrics were recalculated.Baskerville's (1972) correction factor (BCF) was applied to the predictions derived from linearized power models (i.e., the predicted volume was multiplied by the BCF).The BCF is given by exp(σ² res ) 0.5 , where σ² res is the residual variance.
The predictions and residuals generated by the selected models were evaluated by observed vs. predicted values scatter plot inspection, hereafter denoted as 1:1 plot.The residuals (m³) were likewise plotted and weighted by their respective observed values (y-axis) vs. the predicted values (x-axis) for evaluating uncommon patterns of residual distribution.

FITTING AND SELECTING SPECIES-SPECIFIC MODELS
Analogously, species-specific models were fitted for the two main species in the study site, namely M. cinnamomifolia (n = 29; D range: 14.0-56.0cm; H t range: 15.2-27.3m; H s range: 4.6-18.1 m) and H. alchorneoides (n = 53; D range: 6.2-46.9cm; H t range: 6.5-27.8m; H s range: 2.5-25.2m).Due to the smaller sample sizes, the leave-one-out cross-validation approach was chosen.The models' bias (%), R²*, MAPE (%), RMSE, AICc, and w(AICc) were reported.The performances of the best generic model for stem and total volume were compared by applying them using each 'leftout' observation.
Specific and generic models were compared through joint 95% confidence intervals of their parameters built using the 'ellipse' package in R.
The ANOVA for nested models (Picard et al. 2012) was eventually applied to examine dendrometric relationships in the volume models; we calculated the relative importance of the predictor variables by using the 'relaimpo' package in R to assess their role in explaining the response variable in the stem and total volume models.To check the collinearity between predictor variables, the variance inflation factor (VIF) was calculated by the use of the 'car' package in R.

GENERIC MODELS
The generic models for stem volume presented RMSE ranging from 0.08 to 0.15 m³ and MAPE ranging from 9.7 to 44.8%.The model 10 presented the best performance; it yielded RMSE = 0.07 m³ and MAPE = 9.7%, and the uncertainty in the estimates of its parameters was small (PRSE < 5%; Table I).The model showed positive bias, although it was close to zero (1.1%).The 1:1 plot of the residuals revealed a good adjustment of the model to the data, and no evidence of strong heteroskedasticity was discernible.
The generic models for total tree volume presented inferior performance compared to the stem volume models.Their RMSE ranged from 0.17 to 0.20 m³ and MAPE ranged from 16.0 to 72.3%.The greatest performance was achieved by model 4 according to the w(AIC) (Table I).However, we chose not to rely solely on this metric, as the model yielded a MAPE of 30.7% and bias of -12.7%.Additionally, its parameters yielded a PRSE value greater or close to 30% due to the evident collinearity between the terms D² and D²H.Because of these results, we evaluated models 3, 9, and 10 more closely, as they outperformed the others.The intercept of model 3 rendered PRSE > 50%; therefore, we dropped it off and fitted the model again.The three models presented similar performance regarding RMSE and MAPE; models 9 and 10 presented slightly smaller MAPE.The uncertainty in the estimates of the parameters of these models may be considered acceptable (PRSE < 25%) according to the standards proposed by Sileshi (2014).Finally, we may indicate model 10 as the most reliable among the three, because the compound term D²H-included in models 3 and 9-may generate great leverage observations and therefore should be avoided (Sileshi 2014).Some degree of heteroskedasticity was noticeable, although the weighted residual plots showed evenness along the range of predicted values (Figure 2).
The overall quality of the models may be considered satisfactory, thus providing useful volume estimation tools in forest management operations in the Southern Brazilian ERF.The stem volume models outperformed the total tree volume models, as expected.Stem volume data are usually more homogeneous (i.e., less variability) than total tree volume data due to the fact that stems' geometrical attributes are better addressed by a combination of D and stem H.Although the models to estimate total tree volume showed an inferior performance, they are still useful, especially facing the lack of such models for native species in the ERF and elsewhere in Brazil.As stated above, the destructive nature of such a data collection process partially explains the scarcity of models, although efforts could target standing tree measurements for building models to predict total tree volume for native species, in the line of Scolforo et al. (2008) in Minas Gerais state.The subject is more than critical when the wood volume of the crown is proportionally large in relation to the total tree volume.As a matter of fact, in our study, the branches (≥ 5 cm) represented 34.5% of the trees' total volume.This impressive amount of wood is not to be neglected in management systems conducted in secondary forests; it may well be used as firewood or for coal production.Indeed, the wood of M. cinnamomifolia and H. alchorneoides provides a calorific value comparable to Eucalyptus species (Brand et al. 2013, Carvalho et al. 2014).
Generic models to estimate tree total volume using only D as the predictor variable presented an acceptable performance, with MAPE values smaller than 20% (e.g., models 7 and 8).Models 7 and 8 presented quite similar prediction strength compared to model 10, which includes D and H (see Table I).In the case of generic models that estimate stem volume using only D as a predictor, the MAPE values were greater than 25%.Indeed, stem volume models for tropical species that do not include stem H may yield a mean squared error    Temesgen et al. 2015).In fact, studies addressing the uncertainty in the predictions yielded by volumetric models using visually estimated or model-predicted tree heights as inputs would be useful (e.g., McRoberts et al. 2016).

Generic
In regards to methodological aspects, our study exposed a neglected topic highlighted by Sileshi (2014): models with collinear or non-explicative terms may yield parameters with large standard error, despite the eventual satisfactory performance conveyed by metrics such as RMSE and AIC (e.g., see models 5 and 6 in Table I for a clear example).Therefore, we strongly suggest the use of PRSE, standard errors, or confidence intervals of the parameters in model selection.Accordingly, models with nonsignificant parameters (i.e., not statistically different from zero) should be discarded.The PRSE should be evaluated considering the remarks of Picard et al. (2015) and Moser and Oliveira (2017), especially when adopting the parameter exclusion threshold suggested by Sileshi (2014).Reporting at least one of the aforementioned statistics is mandatory, as stated in the guidelines provided by Jara et al. (2015).

SPECIES-SPECIFIC MODELS
Not surprisingly, for both species, stem volume models using D and H outperformed models using only D (Tables II and III).The latter models yielded RMSE ranging from 0.15 to 0.37 m³ and MAPE > 20%.For M. cinnamomifolia, only two specific stem volume models (9 and 10) performed better than the generic stem volume model (model 10) in the leaveone-out procedure.The generic model yielded bias = -2.6%,MAPE = 8.65%, RMSE = 0.07 m³, and AICc = -150.Model 10 fitted for M. cinnamomifolia presented the best performance considering all metrics (Table II).The joint confidence intervals of its parameters did not overlap the ones built for the generic stem volume model.We therefore conjectured that the models were not equivalent, even though differences in their predictions in the 1:1 plot are subtle (Figure 3).
For H. alchorneoides, none of the specific stem volume models outperformed the generic stem volume model applied for this species.Only model 10 presented similar performance to the generic model, and the joint confidence intervals built for their parameters overlapped.The generic model showed a slightly better performance than the best specific model (10); it yielded bias = -0.3%,MAPE = 7.5%, RMSE = 0.05 m³, and AICc = -310.Nevertheless, the 1:1 plot revealed that the differences between the specific model and the generic model were almost unperceivable (Figure 3).
According to the w(AICc), the best specific total volume model for both species was model 7, which included only ln(D) as the predictor variable (Tables II and III).However, models using D and H, such as models 9 and 10, outperformed model 7 regarding the MAPE, although the parameter associated with ln(H) in model 9 yielded PRSE > 25% for both species (see the discussion below).In most cases, the generic model presented inferior performance according to the metrics, even though the 1:1 plots revealed similar patterns regarding their predictions (Figure 3).
Heteroskedasticity in the residuals was observable, especially for H. alchorneoides.The megaphone pattern is expected in biological data, as the magnitude of the residuals increases with the increase in the magnitude of the predictor variables (Picard et al. 2012).However, heteroskedasticity is an issue only when the estimation of confidence intervals for predictions is required (Neter et al. 1996, Picard et al. 2012).
The comparison between generic and speciesspecific models revealed important findings.For species with characteristic and regular architecture like M. cinnamomifolia, specific models might be useful for attaining better predictions.Specific models were more accurate for M. cinnamomifolia than for H. alchorneoides, possibly because the former species has a more regular architecture.Nonetheless, the better performance of speciesspecific models may not justify extra efforts demanded in data collection of reliable samples, even for species with great management potential, for which extra accurate volume prediction would be helpful.As a word of caution, we point out that the two species considered in our study represented together 28.5% of the trees in the dataset.LAIO Z. OLIVEIRA et al.   alchorneoides, D and H were strongly correlated (r = 0.86, p < 0.01) (Figure 4).The collinearity (VIF = 5.24) inflated the standard error of the parameter associated with ln(H), the variable showing weaker relationship with the response variable.As a comparison, the parameter associated with ln(H) yielded PRSE = 33%, while the parameter associated with ln(D) yielded PRSE = 5%.The difference between the explained variation of ln(V) by collinear predictor variables is noticeable: R²* ln(D) = 0.57 and R²* ln(H) = 0.43-the R²* was split between the surrogate variables.A model using only ln(D) would have yielded R²* = 0.98.This brief explanation might support the results regarding the model ln(V) = ln(a) + b • ln(D) + ε as one of the best performances for total tree volume prediction for both species.

Species
As stressed by Picard et al. (2015), collinearity should not overwhelm researchers, especially when dealing with relevant variables for volume or biomass prediction, such as D and H. Collinearity between these variables implies that their regression parameters are not to be standalone interpreted, though rather as an ensemble.Our example might well represent the paradox of maintaining or not maintaining a given parameter when its PRSE > 25% or 30%.Keeping the parameter associated with ln(H) in the model (even though presenting PRSE = 33%) would be a reasonable decision, because for equal D trees, H will act as a correction variable-e.g., the greater the H, the larger the volume estimate by the model (Picard et al. 2015).Evidence in favor of keeping the c parameter in the model is the significant difference (F = 8.9, p = 0.

BARK FACTORS
For the 288 measured trees, the bark represented 16.6% (SD = 4.8%) of their total volume and 14.7% (SD = 4.8%) of their stem volume.The mean bark factors ranged from 0.798 to 0.858, while presenting quite similar values (Table IV).
The mean paired differences between the observed and predicted inside bark volume were small (Table IV).Significant differences (nonetheless small) were found only for the stem volume of M. cinnamomifolia and H. alchorneoides.These results may validate the use of the calculated bark factors in mean inside bark volume estimation.

CONCLUSIONS
We may draw the main conclusions of this study as follows: (i) the overall quality of the models was satisfactory, and therefore they provide tailored LAIO Z. OLIVEIRA et al.
DEVELOPING AND VALIDATING BARK FACTORS Six bark factors were developed to estimate the inside bark total volume and stem volume of all species (n = 288), total volume and stem volume of M. cinnamomifolia (n = 29), and total volume and stem volume of H. alchorneoides (n = 53).The mean bark factor was estimated as the ratio between the inside bark volume and the outside bark volume.To evaluate the reliability of this approach-i.e., using bark factors for attaining mean inside bark volume estimates-the following steps were taken: (i) apply the best fitted volumetric model to the dataset; (ii) multiply the estimates by the respective bark factor; (iii) compare the mean volume generated in step (ii) with the observed mean inside bark volume through 95% confidence intervals for the mean paired difference using the standard t distribution procedure.LAIO Z. OLIVEIRA et al.
stem and total volume models fitted for tree species from the Southern Brazilian evergreen rainforest.a V ̂ = a + b

Figure 2 -
Figure 2 -Observed vs. predicted values generated by model 10 for stem and total volume.The plots are based on three independent iterations of the model fitting algorithm.

Figure 3 -
Figure 3 -Observed vs. predicted values generated by the best specific and generic stem and total tree volume models for M. cinnamomifolia and H. alchorneoides.

TABLE I
RMSE = mean root mean squared error (m³); MAPE (%) = mean absolute percentage error; AICc = mean corrected Akaike's information criterion among the runs of the leave-one-out procedure; BCF = Baskerville's correction factor.Values in parentheses next to the RMSE and MAPE refer to the coefficient of variation (%) among the runs of the leave-one-out procedure.Values in parentheses next to the AICc refer to Akaike weights probabilities.LAIO Z. OLIVEIRA et al.

TABLE IV Bark factors, mean paired differences between the observed total and stem inside bark volume and the predicted inside bark volume using the generic and species-specific models and bark factors.
± 0.0145 m³ ns -nonsignificant (p ≥ 0.05), * -significant (p < 0.05).and useful volume estimation tools in forest management in the Southern Brazilian ERF; (ii) species-specific models performed just slightly better than generic models.Accordingly, it is clear that the additional effort in specific data collection does not pay off; (iii) D single-predictor models are an acceptable solution due to the compromising issues regarding the operational measurement of H in subtropical closed-canopy forests.Eventually, such models may even outperform more complex ones in the case of species with more regular architecture (e.g., M. cinnamomifolia); (iv) quantification of uncertainty in the estimation of regression parameters reveals critical information for model selection and validation; (v) bark factors applied together with volumetric models are effective mean inside bark volume estimation tools. *