Modeling residual biomass from mechanized wood harvesting with data measured by forest harvester.

The lack of accurate models for estimating residual biomass in wood harvesting operations results in underutilization of this co-product by forestry companies. Due to the lack of this information, forestry operations planning, such as chipping and transport logistics, are influenced, with a consequent increase in costs. Thereby, the aim of this study was to propose and evaluate statistical models to estimate residual biomass of Eucalyptus sp. in wood harvesting operations by means of tree variables measured from harvester processing head. Generalized linear models were composed through stepwise procedure for estimating residual biomass by tree covariates: diameter at breast height, commercial height, commercial limit diameter, and stem commercial volume, considering also their transformations and combinations. Residual biomass distributions with positive skew support the application of generalized linear model and Gamma distribution in random component, since normality assumption in traditional linear regression was a requirement not satisfied in this study. By stepwise procedure, tree variables associated to forest biomass were selected, whose linear combinations resulted in models with high statistical efficiency and accuracy. Thus, models developed in this study are innovative tools to estimate residual biomass in mechanized wood harvesting, in which can be inserted into harvester's hardware to provide real-time information.

log, pulpwood, and firewood. In addition, forest biomass for energy purposes has been highlighted in recent years due to the need to reduce carbon dioxide emissions and other gaseous pollutants (Lauri et al. 2014).
Forest biomass can be obtained in the wood harvesting operations of stands with multiple uses, called residual biomass (Yemshanov et al. 2014). In the full-tree harvesting method, residual biomass can be composed by tops, branches, leaves, and barks of harvested trees arranged at the edge of the fields for cutting (Esteban et al. 2011) and posterior using as electricity cogeneration by the thermoelectric plants in forest industries (IBÁ 2017).
The lack of accurate models for estimating residual biomass produced by harvesting operations results in underutilization of this co-product by forest companies. Due to the lack of this information, forest planning and industrial operations are influenced, as chipping and transport logistics, with a consequent increase in production costs.
According to Palander et al. (2009) and Vesa and Palander (2010), it is possible to estimate forest biomass in wood harvesting operations by means of data measured through harvester processing head and stored in the on-board computer. Thus, these data have potential to be inserted into machine's hardware to produce real-time information on the residual biomass produced by forest harvesting.
In the wood harvesting by harvester machine, lengths, diameters, and volumes of the logs are recorded (Nieuwenhuis andDooley 2006, Mederski et al. 2018). With these data, reports of harvested trees containing individual-level variables can be used as covariates for composing regression models. This information can be useful for estimating forest biomass in real-time by harvester's on-board computer.
Thereby, the aim of this study was to propose and evaluate statistical models to estimate residual biomass of Eucalyptus in wood harvesting operations by means of tree variables measured from the harvester processing head. The proposed regression models aiming to assist the planning of chipping and logistics in Eucalyptus stands, considering the hypothesis that the variables obtained from the processing head enable accurate estimates of residual biomass.

STUDY AREA
This study was carried out in wood harvesting Eucalyptus stands located at Paraná State, Brazil, between the coordinates 24°26' S and 50°45' W. The predominant region's climate was subtropical Cfa and Cfb, according to the Köppen classification, with annual average temperature between 18 ºC and 20 ºC and average annual rainfall of 1,400 to 1,600 mm, in relief with average slope of 8% at 940 m.a.s.l. (Alvares et al. 2013).
The forest stands were composed of clones of Eucalyptus saligna Smith and Eucalyptus grandis W. Hill ex Maiden × Eucalyptus urophylla S. T. Blake interspecific hybrids, both with 7 years-old and initial density of 1,111 trees per hectare at a spacing of 3.75 m × 2.40 m. The stands characteristics are described in Table I.
The full-tree harvesting method used by the company was composed of feller buncher, skidder and harvester. In this method, feller buncher performed the felling and stacking trees in bundles inside the field, in which the skidder dragged the bundles to the edge of the field. The harvester performed the final wood process with a processing head (Table II), cutting logs in 7.2 m for pulp with commercial limit diameters Before the wood harvesting operations, a forest inventory was carried out to characterize the diameter distributions of Eucalyptus stands. Thus, thirty trees were randomly selected by stand and proportional to five diameter classes determined by Sturges' rule (Table III), resulting in a sample of sixty trees to measure individual tree variables. These trees were scaling in logs with a length of 7.2 m, from a height of 0.1 m to the commercial limit diameters (d i ) of 8, 10, 12, and 14 cm, aiming for simulating harvester processing head, in which commercial volumes (v) were determined by Smalian's formula. Additionally, diameter at breast height at 1.3 m above ground (d) and stem commercial height (h) were measured to determine the tree variables at individual-level that can be recorded by the harvester's on-board computer.
Subsequently, biomass compartments of tops with bark, branches, and leaves were quantified by destructive method, in which samples were randomly selected for drying in oven with air circulation at 105 ºC until constant mass. By means of wet biomass and humidity for components, residual biomass for each tree (w) was determined.

MODELING RESIDUAL BIOMASS
Generalized linear models (GLM) were composed for estimating residual biomass (w) by the covariates: diameter at breast height (d), commercial height (h), commercial limit diameter (d i ), and commercial volume (v), considering their transformations in logarithm, inverse, and root-square, as well as the combinations dh, d 2 h, d i h, and d 2 i h. The covariates were selected based on the stepwise procedure through the MASS package (Venables and Ripley 2002) in R programing (R Core Team 2018).
The use of GLM was motivated by the generalization of ordinary linear regression for dependent variables with error non-normally distributed. Thus, for residual biomass with non-normal distribution by the Shapiro-Wilk's test at 5% of significance level, as well as positive skew, we considered the Gamma distribution (1) for the random component (error), which is related to the systematic part (linear model) by means of the canonical link function (2) (Faraway 2016).
Where: μ and / 0 are Gamma distribution parameters, and η is link function. In addition, we consider the identification of discrepant values or outliers for the residual biomass estimates, as well as the possibility to removing them to improve the fits. For this, the studentized residuals (3) were plotted per leverage values (h ii ), which correspond to the importance of each observation for fitting the models, through the car package in R (Fox and Weisberg 2019). Thus, studentized residuals (t i ) greater than -2 or 2 with leverage close to zero were identified as outliers of weak contribution to modeling and could be removed without loss of phenomenon characteristics.
Where: e i is ordinary residuals, σ 2 is residual variance, and h ii is leverage. The models were evaluated based on significance of regression coefficients (b i ) at the 1% of significance level by t-test; Akaike's information criterion (AIC), aiming to select the model whose combination of covariates results in the lowest value (4); coefficient of determination (r 2 y y ), given by the square of correlation between observed and estimated biomass values; normalized root-mean-square-error (NRMSE) as a measure of accuracy, where values close to zero indicate low residual variance (5); and graphical analysis of Pearson's residuals (r p i ) to indicate the absence of systematic tendency in the estimates (6).

AIC = -2 ln (L) + 2p
(4) Where: ln(L) is log-likelihood function, p is number of model parameters,ý is mean value of residual biomass, y i is observed value of residual biomass, y i is estimated residual biomass, n is number of observations, μ i is estimated mean of residual biomass, and V i is estimated variance of residual biomass. Additionally, half-normal plots with confidence envelopes at the 95% of significance level were performed to indicate if the models were adequately specified. For this, studentized residuals were plotted by the theoretical quantiles and 99 sample simulations were carried out using the estimates of fitted models through the hnp package in R (Moral et al. 2017).

RESULTS
The residual biomass of Eucalyptus stands with non-normal distributions, at the 5% of significance by Shapiro-Wilk's test, and positive skew (Figure 1) corroborate the use of Gamma distribution for the random component of generalized linear models in this study. Thus, it was shown that the application of traditional statistical techniques, whose normality hypothesis is required, are not entirely adequate for the analysis of residual biomass variable in wood harvesting operations.
By means of the studentized residuals by the leverages (Figure 2), discrepant values were observed in the modeling of E. saligna (Figure 2a) and E. grandis × E. urophylla stands (Figure 2b), which resulted in studentized residuals higher than -2 and 2 in the estimations. However, in these observations, five points in Figure 2a and seven in Figure 2b presented leverage values closer to zero, indicating that they have a low contribution for modeling and can be removed.
The stepwise procedure resulted on the selection of the covariates: commercial volume (v), diameter at breast height (d), commercial height (h), commercial limit diameter (d i ), log d transformation, and the combinations of dh, d 2 h, d i h, and d 2 i h. Thus, the following regression models were composed for E. saligna (7) and E. grandis × E. urophylla (8) stands for estimating residual biomass ( w): In Table IV, these models resulted in lowest values of the Akaike information criterion (AIC), whose regression coefficients ( β i ) were statistically significant at the 1% of significance level, as well as efficient to explain residual biomass variance by the coefficients of determination (r 2 y y ) greater than 0.9. In addition, the normalized root-mean square-error (NRMSE) of the models corroborated with their high statistical accuracy, resulting in values close to zero and indicating low residual variances.
By means of Pearson's residuals (Figures 3a and 3b), we observed constant distributions of the residuals that showed the absence of heteroscedasticity and systematic errors for estimating biomass. In addition, half-normal plots (Figures 3c and 3d) indicated that the fitted models were satisfactory, in which the studentized residuals were inside in the simulated envelopes at a 95% of significance level.

DISCUSSION
Some studies have been carried out to evaluate regression models to estimate stump biomass (Palander et al. 2009, Vesa and Palander 2010), productivity (Gallis and Spyroglou 2012, Palander et al. 2017, Brewer et al. 2018, and production and costs (Silayo and Migunga 2014) in wood harvesting methods. However, none of them proposed and evaluated statistical models to estimate residual biomass of Eucalyptus stands or other forest species by means of tree variables measured from the harvester processing head. This evidences that the results of this study are important for the development of science and for application in forest sector.
Residual biomass distributions with positive skew (Figure 1) support the application of generalized linear model and Gamma distribution in the random component, since the normality assumption in traditional linear regression was a requirement not satisfied in this study. In addition, the use of Gamma model is plausible for the analysis of non-negative continuous data (Manning et al. 2005, Faraway 2016), as for the residual biomass of wood harvesting.
Generalized linear models also has the advantage for eliminating the dependent variable transformation practice, in order to satisfy the classical linear regression assumptions (Cordeiro andAndrade 2009, Lo andAndrews 2015). Thus, statistical models composed in this study lead to a most explanation of the phenomenon under investigation. These results are especially feasible for data from forest harvesting researches.
The stepwise procedure selected variables associated with forest biomass (Table IV), such as the commercial volume, which has a relationship with wood density and, therefore, with woody components of residual biomass (Somogyi et al. 2007). Also, the variables diameter, height, and their combinations can be associated with the tree component mass that constitutes the tree canopy in residual biomass.
The linear combinations of these variables resulted in models with efficiency and statistical accuracy for estimating residual biomass from tree measurements recorded on the harvester's on-board computer. This was evidenced by the highest r 2 y y and lowest NRMSE values (Table IV), as well as by the uniformly distributed residuals (Figure 3), although the residuals with symmetrical distribution are not an assumption required for generalized linear models (Faraway 2016).
Thus, models developed in this study are innovative tools to estimate the residual biomass in mechanized wood harvesting of Eucalyptus stands, in which can be inserted into the harvester's hardware to provide real-time information for operational planning and logistics. In addition, we suggest that other factors that also influence on the biomass production, such as age, species or genetic material, and forest site productivity (Palander et al. 2009, Eloy et al. 2018, are investigated to be incorporated in the models.

AUTHORS CONTRIBUTIONS
RODRIGUES CK, LOPES ES, FIGUEIREDO FILHO A and PELISSARI AL conceived and designed the study; RODRIGUES CK and SILVA MKC performed the study; RODRIGUES CK and PELISSARI AL analyzed the data; LOPES ES and FIGUEIREDO FILHO A contributed to materials/analysis tools; RODRIGUES CK and PELISSARI AL wrote and revised the paper.