HEIGHT-DIAMETER RELATIONSHIPS FOR Eucalyptus grandis HILL EX. MAIDEN IN MOZAMBIQUE: USING MIXED-EFFECTS MODELING APPROACH

mixed-effects The perspective is to increase accuracy, practicality, precision of height estimations and decrease of sampling effort in forest inventories in Mozambique ABSTRACT Equations express that height-diameter relationships are used to estimate tree heights that were not measured in the plots, as well as to calculate their volumes. In this study, we modelled height for Eucalyptus grandis Hill ex. Maiden stands using nonlinear mixed effect models


INTRODUCTION
The forest sector in Mozambique figures among the most important economic activities in the country, representing around 3.8% of the Gross Domestic Product (GDP) originated in the primary production, corresponding to 570 million US dollars in 2019 (WORLD BANK, 2020). There are around 7.3 million hectares with potential for forest investment in the country, with emphasis for the silvicultural area, which is able to generate around 10 billion US dollars annual revenue and approximately 300.000 formal jobs (Bleyer et al., 2016;Nube et al., 2016;MITADER, 2016). Forest plantings are concentrated mainly in Eucalyptus spp. and Pinus spp. with various purposes including cellulose, timber, lampposts, demand of vegetable energetic biomass, which has grown over the years (Bila and Issufo, 1994;FAO, 2001;Coetzee and Alves, 2005;MINAG, 2009;Overbreek et al., 2012;MITADER, 2016). We give emphasis to Eucalyptus grandis Hill ex. Maiden, which has been largely cultivated by the forest industry and small rural owners throughout the country, associated to the good performance of plantings, easy adaptation to the environment, quick growth in different sites and good timber quality (Willan, 1981;Chitará, 2003;Shimizu, 2006;MINAG, 2009;Maússe-Sitoe et al., 2016;Guedes et al., 2016).
In order to promote the growth and investment of the forest sector in the country, the government of Mozambique has created and approved the National Strategy for Reforesting, whose objective was to develop silviculture aiming to encourage the increase of cost-effective commercial and industrial plantings, on a competitive and sustainable basis from the economic, social and environmental perspective (MINAG, 2009). The government expectation is to increase the areas of commercial forest planting from the current 64,000 ha to 1,000,000 ha in 2030. For this implementation, investments around 50 million US dollars will be necessary (Chitará, 2003;Nhantumbo and Salomão, 2010;Nhantumbo et al., 2013;MITADER, 2016), which need to be properly managed with an adequate longterm planning (MITADER, 2018).
For this purpose, one of the assumptions is that a manager has knowledge about forest productivity during the whole growth cycle, what allows a broader control that culminates in cost-effective and solid decisions in the forest enterprise (Sharma and Parton, 2007;Crecente-Campo et al., 2010). In this sense, it is vital to know the dynamics of tree growth as well as to determine their magnitude. Height and diameter are the most relevant dendrometric variables since they provide the values for volume, biomass, site quality, growth and stand productivity (Castedo-Dorado et al., 2006;Pretzsch, 2009;Bahadur et al., 2019). Thus, the study of biometric relations such as hypsometric relation (h/d), allows to know tree height with precision, reducing time and cost during data collection for forest inventory (Soares and Tomé, 2002;Calama andMontero, 2004, Mendonça et al., 2015), with the measurement of a number of trees, estimating non-measured heights with hypsometric equations (Mehtätalo, 2005;Corral-Rivas et al., 2007;Machado and Figueiredo Filho et al., 2009). According to Dorado and Diéguez-Aranda (2006), modelling allows to adjust equations that estimate height according to diameter, including other variables such as age, site index or dominant height, improving the quality of adjustments of height curves. Costa et al. (2016) outline that the use of classic regression models requires meeting key assumptions such as independence among observations and homogeneity of variance, which are not always achieved, thus indicating explicit modelling of the covariance structure and the use of mixed models, what allows to generalize structures of spatial-temporal correlations and non-constant variance (Adame et al., 2008;Yang and Huang, 2009;Ercanli et al., 2015). Mixed models are one of the most sophisticated regression techniques with adjustment of a more consistent model, allowing to better verify model variability (Pinheiro and Bates, 2000;Trincado et al. 2007;Bueno-López and Bevilacqua, 2012). Thus, mixed effect models help include a set of non-observable variables in the model structure, called random effects, which incorporate tree and/or plot variability, depending on the study objective, in order to allow its use with observable variables, or fixed effects (Calegario et al., 2005;Huang et al., 2009;Paulo et al., 2011;Ferraz Filho et al., 2018). Although there are no doubts about the advantages of the use of mixed models to avoid biased estimations, there is still some uncertainty about the advantage of its use for accuracy purposes (Littel et al., 2006, De-Miguel et al., 2012Bahadur et al., 2019). Thus, in order to provide better estimations for the mean response using a new set of data, random effects can be estimated when information about a sample of peer data of the hypsometric relation (h/d) is available. This procedure -calibration (or location) -and the resulting response, is called calibrated response (Lappi, 1991;Paulo et al., 2011;Gómez-García et al., 2015).
Within this framework, the present study aims to i) increase the accuracy and precision of equations to predict height of Eucalyptus grandis, by using nonlinear mixed models, in Mocuba District, Zambézia Province, Central Mozambique; and ii) evaluate calibration strategies for random effects, including different numbers of trees measured in the stand by subsequently incorporating them to height estimation of trees that were not measured in the stand.

Study area
This study was carried out in a stand of Eucalyptus grandis belonging to N'tacua Florestas Company Ltda., in the forest community of Conono, in Mocuba District (16°51' S latitude and 36°59' E longitude), at Zambézia Province, Mozambique. The climate in the region is Aw according to Köppen classification, with mean annual temperature 22.6 °C and mean annual precipitation 1200 mm occurring between October and March (MAE, 2005). Soils are sandy and clay with predominance of dystrophic red latosol (Ibraimo, 2004). The study area was characterized geomorphologically as plain relief and slightly wavy terrains with declivities between 10.0% and 15.0%, and mean altitude of 400 m. The planting was established in 2007 in an area of 13.0 ha, with 3 m x 3 m spacing (1111 trees.ha -1 ). In the area, 260 kg.ha -1 of NPK fertilizer were applied, with the formulation 06-30-06, at three months, two years, and four years old.

Data description
A conventional forest inventory was performed in June 2015 (when the stand was 8 years old), through systematic sampling. Forty temporary plots with area of 400 m2 (20m x 20m) were measured. For each tree, diameter at breast height (c) was measured with a measuring tape, and height (h) with Blume-Leiss hypsometer. The circumference at breast height (c) was converted into diameter (d). For each plot, the mean dominant height by Assmann (h100) was determined, which means it was equivalent to the 100 thickest trees per hectare, the tree diameter of mean basal area (dg) and basal area per hectare (G). In total, 1414 trees were measured in the field. Summary statistics of data used in this study are present in Table 1.

Modeling of height-diameter relationship
Fixed model (traditional models) Six nonlinear hypsometric models (h/d) (M1-M6), proposed by Scolforo (1998), Huang et al. (2000), Bartoszeck et al. (2004), Barros et al. (2004), Sharma and Parton (2007) and Pretzsch (2009) were adjusted to data of 30 plots. Where: β 0 and β 1 = estimated regression coefficients; h=height, in m; d=diameter at breast height, in cm. Mixed effect modelling with stand variables The structures of nonlinear fixed effect models (M1-M6) described above (traditional models) are based on the assumption that ε ij is distributed with variance σ2, with ε ~ N (0, σ2). The method of least squares assumes that parameters β 0 and β 1 are fixed and stable in all sample units (in the plot and in the stand level) within the whole stand. Consequently, these models include only fixed effects that provide mean predictions of height for the stand level.
However, the height-diameter ratio varies within sample units in the same stand level, since conditions and structures of the stand are specific regarding density, terrain, and soil attributes, with significant variability. On the other hand, the hierarchic structure of models results in highly correlated observations for height and diameter measured in the same plot. In order to solve this problem, prediction systems based on the mixed effect modelling procedure, simultaneously including fixed and random parameters in the model structure, were used in the best predictive model selected (traditional models), according to Calama and Montero (2004), Sharma and Zhang (2004), Castedo Dorado et al. (2006), Trincado et al. (2007), Sharma and Parton (2007) and Adame et al. (2008). The general expression for the model can be defined as, Where: β 0 ; β 1 ; a1 and a2= are fixed parameters of the model; u1 and u2 = random parameters, specific to the ith plot; h = height, in m; h 100 = dominant height, in m; G = basal area, in m 2 per hectare; d = diameter at breast height, in cm; dg = quadratic mean diameter, in cm. HOFIÇO et al [7] Calibration of the mixed model For the estimation of random parameters, tree heights were tested for different calibration scenarios (tree diameter: mean, median, dg, hohenadl d-, hohenadl d+, dominant for Assmann and its relationships) corresponding to 10 plots used for calibration. Thus, it was possible to calculate random parameters according to the expression proposed by the Bayesian estimator (Vonesh and Chinchilli, 1997;Sharma and Parton, 2007). As suggested, these sample alternatives for the calibration responses included the selection of trees based on specific categories of stand size (Calama and Montero, 2004;Yang et al., 2009;Gómez-García et al., 2015;Ercanli et al., 2015).

Model fitting and evaluation
Models were evaluated according to their accuracy and precision by the statistical criteria: coefficient of determination (R 2 ), standard error of estimation (RMSE), Akaike information criterion (AIC), Bayesian Information Criterion (BIC) and graphical residual analysis. The lowest values of the AIC, BIC, RMSE and the highest of R 2 indicate better results for model adjustment, according to Pinheiro and Bates (2000), Castelo Dorado et al. (2006), and Sharma and Parton (2007).
In this study, the predictions for variance components and fixed parameters of the model were obtained with the maximum likelihood estimation (ML) by PROC NLMIXED of the software SAS V. 9.2 (SAS Institute Inc., 2009). We tested different combinations of fixed and random parameters and compared the fitting statistics (RMSE, R2, AIC and BIC), to determine which parameter (s) should be considered mixed-effect. The Adaptive Gaussian Quadrature was used to numerically solve random effects in the SAS NLMIXED procedure, from which the detailed information was explained by Littel et al. (2006). The Lag residuals graphical analysis was used to verify residual autocorrelation. With the mean absolute error values (MAE), an analysis of variance (α = 1% and α = 5%) was performed entirely at random, where the treatments were respectively the MAE calculated for different subsample size scenarios in calibration, with the purpose of verifying if there were significant differences between the tested options.

Fixed models (traditional models)
The adjustment results (Table 2) showed significant regression coefficients (β0 and β1) (p < 0.05) in the six hypsometric models. According to the statistical criteria of adjustment and precision, models showed similar results, with no significant difference in their results. Overall, models also showed similar residual distribution (Figure 1), being thus proposed model 1 (M1) to describe the h/d ratio of the sampled data. Based on these criteria, M1 was selected and modified (M7), obtaining a new structure with population variables, modelled with a nonlinear mixed effect structure, simultaneously including random and fixed effects. Where: β 0 and β 1 = estimated regression coefficients; R 2 = coefficient of determination; RMSE = standard error of estimation; AIC = Akaike Information Criterion; BIC = Bayesian Information Criterion; <0.0001 = probability significance p value.

Mixed effect modelling
From the six possible combinations to test random effects (u1 e u 2) in the model structure (M7), the most appropriate according to the statistical criteria analyzed was the inclusion of the effect in the intercept and angular coefficients simultaneously (β 0 +u 1 ) e (β 1 +u 2 ), with the mixed model showing all the significant coefficients (p < 0.05) (Table 3) and the decrease of AIC and BIC values in relation to M1 (Table 2), in other words, 15.2% and 13.7%, respectively.
The graphic of the standardized residuals according to observations for the nonlinear mixed effect model (M7) shows values around zero and no systematic tendency (Figure 2), also justifying the use of the model. The Lag residuals graphical analysis in the mixed effect model (M7) (Figure 3) evidenced the broad decrease of autocorrelation (Lag 1 and Lag 2), what can be attributed to the random effects introduced in the model.

Evaluation of the calibration response according to plot
The estimated values of RMSE based on subsamples of trees including predictions with parameters of fixed effect (M1) and random effect (M7) models for different calibration alternatives in relation to the sampling effort are shown in Table 4. Among the choices for calibration, the best results were achieved with the measurement of dominant trees plus one randomly selected tree in the plot (dominant trees + 1 tree -RMSE = 0.556), and this calibration procedure was the most appropriate and practical among the ones assessed. This option was followed by the calibration involving (dominant trees + 3 trees -RMSE = 0.550) and (dominant trees + 6 trees -RMSE = 0.542). These last calibration options provide more accuracy but demand more time and financial resources since they needs a higher number of tree measurements in the plot.

DISCUSSION
The data evaluated for E. grandis in Mocuba District, Mozambique, showed a great growth potential of the species in the region. Maússe-Sitoe et al. (2016) emphasize the excellent initial species growth in the central and north regions of the country. When comparing Where: β 0 and β 1 = estimated regression coefficients, and β 0 and β 1 are fixed parameters indicating the intercept of the model; a1 and a2 = are fixed parameters, for dominant height (h 100 ) and basal area (G) respectively; RMSE = standard error of estimation; AIC = Akaike Information Criterion; BIC = Bayesian Information Criterion; p<0.0001 = probability significance p value (α = 5%).

FIGURE 2
Plot of residual distribution against number of observations for mixed models for Eucalyptus grandis.

FIGURE 3
Lag residuals of height for M7 with the inclusion of stand variables using the mixed effects approach for Eucalyptus grandis.
the fixed M1 (Table 1) with the mixed M7 (Table 3), a gain was obtained in precision and accuracy in the estimations of height when using mixed modelling. These models have a high predictability capacity and are adequate to explain hierarchic variance when compared to traditional regression techniques (Sharma and Parton, 2007;Adame et al., 2008;Mendonça et al., 2015;Ercanli et al., 2015). This fact means that the use of mixed modelling for the set of data of the present study has ensured precise height estimations for the stand and maintained coherence in the biological interpretation. The success of M7 in the modelling of hypsometric relation (h/d) is explained by the inclusion of stand variables (h100, G and dg) in the model structure. The stand variables are able to adequately reflect variability (heterogeneity) of trees sampled in different plots (Huang et al., 2000;Bartoszeck et al., 2004;Adame et al., 2008;Pretzsch, 2009;Gómez-García et al., 2015). For example, Zeide and Vanderschaaf (2002) found that stand density affects hypsometric relation (h/d) in a stand, in other words, trees of the same height generally have smaller diameters in dense stands. The incorporation of additional predictor variables can improve predictions but requires a greater sampling effort and limits model application. Several variables were proposed as additional predictor variables, such as stand age (Soares and Tomé, 2002;López Sánchez et al., 2003;Barros et al., 2004;Calama and Montero, 2004;Calegario et al., 2005;Castedo Dorado et al., 2006, Mendonça et al., 2015, dominant height (Corral-Rivas et al., 2014;Gómez-García et al., 2015), crown competition factor (Temesgen et al., 2007), geographical characteristics (Hynynen et al., 2002;Russell et al., 2010;Schmidt et al., 2011), or wind speed (Meng et al., 2008). Whereas the sampling trees used in the study were homogenous, however, some data found indicated a hierarchic structure. This problem designated as autocorrelation can cause an estimation error of the reliability intervals for equation parameters or affects negatively the reliability of results of regression models (Gómez-García et al., 2015;Ercanli et al., 2015). This decrease of variance was attributed to the random effect modelling, what in many cases resolves part of the correlation between observations (Vonesh and Chinchilli, 1997). However, Westfall (2010) states that heteroscedastic errors for the hypsometric relation (h/d) using nonlinear mixed effect models is associated with the different model structure and/or data origin.
On the other hand, decrease in RMSE with calibration was more evident when the mixed model used showed a predictive performance comparatively above the fixed model (traditional model) used according to the procedure adopted for calibration in the subsample (Table  4). The inclusion of random effect variables in the model structure responds to tree variability in the stand and in different time series in the subsample model (Trincado et al., 2007;Yang et al., 2009;Lejeune et al., 2009;Sharma and Parton, 2009). When evaluating the calibration responses with mixed models, the most important aspect is to decide the ideal number of trees to be used in the calculation of random effects in the plot, for instance, using three, four, five or any other number of trees, selected or chosen in the subsample. The calibration response based on the selection of (dominant trees + 1 tree) in the sampled plots resulted in less tendentious predictions when the practicality of field measurement was considered. The calibration response similar to the one used as a nonlinear mixed effect model was obtained by other authors such as Calama and Montero (2004) and Sharma and Parton (2009). According to Trincado et al. (2007), the use of a mixed model in forest inventories by selecting a subsample of dominant trees for height measurement, allows the maintenance of a simple model structure without the inclusion of predictor variables. However, studies by Castedo- Dorado et al. (2006), Crecente-Campo et al. (2010) and Paulo et al. (2011) verified that the best predictive sequences for the calibration response were obtained by selecting smaller trees in the sampling plots. The decisive questions and the evaluation process that are crucial for mixed effect models must consider some sampling alternatives, including the different selections of tree subsamples, in order to achieve the best predictive performance with these models Corral-Rivas et al., 2014;Ercanli et al., 2015). Moreover, this equation has the advantage that it can be used for prediction purposes with data commonly measured in forest inventories in Mozambique and can be easily implemented in disaggregated statistics in a model of dynamic growth based on the approach time and space according to Gómez-García et al. (2015).

CONCLUSION
The update of the nonlinear mixed effects model for estimation of the hypsometric relation proved above average in all statistics analyzed. These models are preferable in relation to the nonlinear traditional models since they produce estimations with high precision, are parsimonious, their parameters are interpretable, in addition to their broad applicability in any data base of forest origin.
The development of flexible equations that allow to describe the variations inherent to sites is advantageous to improve the prediction of tree height. Furthermore, the calibration that includes the prediction of random parameters that used a subsample of trees belonging to the stand can be suggested in the practical perspective to increase accuracy, precision of height estimations, decrease of sampling effort, and practicality in future forest inventories in Mozambique. Further works on recalibration, validation, and verification of our model using a larger dataset collected from a wider range of species distribution will be more interesting.