DANTAS et al MULTILEVEL NONLINEAR MIXED-EFFECTS MODEL AND MACHINE LEARNING FOR PREDICTING THE VOLUME OF Eucalyptus SPP. TREES

Volumetric equations is one of the main tools for quantifying forest stand production, and is the basis for sustainable management of forest plantations. This study aimed to assess the quality of the volumetric estimation of Eucalyptus spp. trees using a mixed-effects model, artificial neural network (ANN) and support-vector machine (SVM). The database was derived from a forest stand located in the municipalities of Bom Jardim de Minas, Lima Duarte and Arantina in Minas Gerais state, Brazil. The volume of 818 trees was accurately estimated using Smalian’s Formula. The Schumacher and Hall model was fitted by fixedeffects regression and by including multilevel random effects. The mixed model was fitted by adopting 14 different structures for the variance and covariance matrix. The best structure was selected based on the Akaike Information Criterion, Maximum Likelihood Ratio Test and Vuong’s Closeness Test. The SVM and ANN training process considered diameter at breast height and total tree height to be the independent variables. The techniques performed satisfactorily in modeling, with homogeneous distributions and low dispersion of residuals. The quality analysis criteria indicated the superior performance of the mixed model with a Huynh-Feldt structure of the variance and covariance matrix, which showed a decrease in mean relative error from 13.52% to 2.80%, whereas machine learning techniques had error values of 6.77% (SVM) and 5.81% (ANN). This study confirms that although fixed-effects models are widely used in the Brazilian forest sector, there are more effective methods for modeling dendrometric variables. v.26 n.1 2020


INTRODUCTION
One of the most important parameters for knowing the forest potential production of a specific region is log volume. Log volume is a starting point for assessing the wood content of a forest stand, underpinning decisions related to silvicultural treatments, logging and timber transport. Therefore, log volume must be correctly determined to represent the sampled population well.
The volumetric estimation of Eucalyptus spp. clones is usually based on equations in which diameter at breast height (DBH) and total tree height (H) are the independent variables. According to Campos and Leite (2009), the Schumacher and Hall model stands out in tree volume estimation. However, classical regression models assume independence between observations and homogeneity of variance, which, in some cases, may not be true.
An alternative for analyzing correlated data, in space and/or in time, and for explicitly modeling their covariance structure is to use mixed models. Some of the possible approaches with mixed models include generalizing correlation and variance structures (Fu et al., 2017;Cropper and Putz, 2017;Özçelik et al., 2018;Wang et al., 2019). Mixed models are sophisticated regression techniques, and the study by Lappi (1991) pioneered their use in forestry research.
Mixed models are consistently fitted by the inclusion of unobservable variables, termed random effects, along with observable variables, termed fixed effects (Pinheiro and Bates, 2000). Besides that, mixed models can describe incomplete blocks, split plots, spatiotemporal data and random coefficients, as well as polynomial and growth curves.
Usually, in forest sciences, mixed models are applied to nonlinear problems, such as height growth, basal area increment of Eucalyptus spp. stands and genetic evaluations (Calegario et al., 2005;Barantal et al., 2019;Wang et al., 2019;Sharma et al., 2019).
In addition, computational approaches of artificial intelligence/machine learning, including artificial neural networks (ANN) and support-vector machines (SVM), have been increasingly used as tools for forest data analysis, modeling, variable estimation and production prognosis . Those tools have provided gains in the quality of estimates and predictions (Vendruscolo et al., 2015;Martins et al., 2016).
An ANN is an algorithm based on simple processing units (artificial neurons), mimicking the neurons found in the human brain, which calculate specific functions. Those units are layered and connected to each other through weights, which store experimental knowledge and weigh the inputs of each unit. Therefore, the acquired knowledge becomes available for use (Haykin, 2001;Silva et al., 2018).
The most striking features of ANNs are their learning and generalization capacities. In other words, ANNs are able to, through a learned example, generalize knowledge assimilated for an unknown dataset. Another interesting feature of ANNs is their ability to extract nonexplicit characteristics from a dataset provided as examples (Gorgens et al., 2009). Support-vector machines (SVMs) have also become an interesting alternative for the mathematical modeling of complex systems. They are simple techniques, in terms of their conceptual framework, capable of solving extremely complex and real problems. In an SVM, input space vectors are mapped nonlinearly to a characteristic space of high dimensionality where a linear decision surface is constructed, constituting an optimal separation hyperplane, for example, the binary separation between data that has positive and negative labels, such that the separation margin is maximum (Shao et al., 2014).
Considering the relevance and sophistication of these techniques, this study aimed to assess the performance of the nonlinear volumetric model by Schumacher and Hall fitted by fixed and mixed regression with variance and covariance matrix modeling, by training an artificial neural network and by constructing support-vector machine to estimate the log volume of Eucalyptus spp. trees.

Data
The study area comprises 21 management units planted with Eucalyptus spp. hybrid clones, located in the municipalities of Bom Jardim de Minas, Lima Duarte and Arantina, in Minas Gerais, Brazil, totaling 1,090 ha of inventoried area.
The climate of the region is characterized as subtropical highland climate, Cwb type, according to the Köppen climate classification, with an average annual temperature of 20.1°C, with dry and cold winters, with frost occurring in some areas, and with rainy summers CERNE DANTAS et. al with moderately high temperatures. The average annual total precipitation is 1,456 mm (Alvares et al., 2013).
The data used in this study were collected by accurately estimating the cubic volume of 818 trees of different ages and sizes. The study parameters were total height (Ht), measured in meters; diameter at breast height (DBH), measured in centimeters; and diameters at the base (at 0.1 m of height) and at heights of 0.5 m, 1 m, 1.5 m and 2 m and, every 2 m from then on. Log volumes were calculated using Smalian's Formula. Descriptive statistics of the data are reported in Table 1.
(2017), where V is the volume in m³; DBH is the diameter, in cm, at 1.30 m of height; Ht is the total height in m; β 0 , β 1 and β 2 are the parameters of the model; and is the random error. Scatter plots of log volume as a function of DBH and Ht are shown in Figure 1, indicating a nonlinear relationship between these variables.

Volumetric model
The nonlinear volumetric model by Schumacher and Hall (1933) (1) was fitted to the volume data. The processing was performed using the software environment for statistical computing R, version 3.4.1

[1]
Multilevel mixed-effects model for volumetric estimation Subsequently, to evaluate the use of mixed-effects models to estimate the log volume of Eucalyptus spp., the Schumacher and Hall model was refitted by incorporating the variability of each tree and management unit, thereby generating a multilevel nonlinear mixed-effects model with fixed and random parameters. The model was fitted using the maximum likelihood method, proposed by Fisher, according to Searle (1987), which consists of obtaining estimators that maximize the probability density function of observations with respect to fixed effects and variance components.
In nonlinear mixed models (2) the response variable y ij represents the random groups i and j, where i are the i-th management units and j the j-th trees. Then, i = 1,..., m, and j = 1,..., n i , where m is the total number of management units and n i is the number of trees within the i-th management unit; f is a general, real and differentiable function of a specific group of parameter vectors ϕ ij and a covariant vector v ij ; and ε ij is the random error normally distributed within groups (Pinheiro and Bates, 2000).

[2]
The parameter vector varies from individual to individual. In a second stage, the vector ϕ ij can be expressed by equation 3, where β is a fixed-effects vector (p x 1); B i is a vector (q 1 x 1) of random effects independently distributed with a covariance-variance matrix ψ 1 ; B ij is a vector (q 2 x 1) of random effects independently distributed with a covariance-variance matrix ψ 1 and presumed independent of first-level random effects; A ij and B ij are incidence matrices; and ε ij , within groups, are independently distributed and independent of random effects. [3] In modeling mixed models, a key step is the definition of the variance and covariance structure because it aims to obtain a parsimonious structure that explains the data variability and the correlation between the measurements and a small number of parameters (Clark and Linzer, 2012). This choice may directly affect parameter estimates, standard errors of fixed and random effects, diagnoses and inferences. This selection depends on data structures, empirical information and computational availability.
Variance and covariance structures were entered in the nonlinear mixed model by Schumacher and Hall because trees belonging to the same management unit are likely more correlated with each other than otherwise. This processing was performed using the package nlme, software environment R, and its function correlation (Pinheiro and Bates, 2000). In total, 14 structures were used, including Variance Components (VC), UNstructured (UN), Compound Symmetry (CS), First-Order Autoregressive (AR(1)), Heterogeneous First-Order Autoregressive (ARH(1)), Heterogeneous Compound Symmetry (HCS), Toeplitz (TOEP), First-Order AutoRegressive Moving Average (ARMA(1,1)), Heterogeneous Toeplitz (TOEPH), First-Order Ante-dependence (ANTE(1)), UNstructured Correlations (UNR), Spatial Power (SP(POW)(c-list)), Banded Main Diagonal (UN(1)) and Huynh-Feldt (H-F). More details on these structures are available from Pinheiro and Bates (2000) The Akaike Information Criterion (AIC) (Sakamoto et al., 1986) (4), for which the best model is that which has the lowest AIC value; the Maximum Likelihood Ratio Test (MLRT) (Pinheiro and Bates, 2000); and Vuong's Closeness Test (1989) were used to choose the structure of the variance and covariance matrix, where AIC is the Akaike Information Criterion, ln is the Napierian logarithm, ml = maximum likelihood value, and p is the number of parameters of the model.
That is, the models are equivalent to the significance level , and Z /2 is the critical value of the standard normal distribution, rejecting the null hypothesis if |T RLNN | > Z 1− /2 .

Machine learning
SVMs and ANNs were used as machine learning approaches. SVM construction was based on the supervised machine learning process described in detail by Haykin (2001), with a set of n samples represented as an ordered pair (X, Y), where X is a matrix of explanatory variables of the sample and Y is the vector of expected values of the sample. Based on this information, a function that predicts the expected value of the sample, using a vector of characteristics as input dataset, is chosen. This linear function is represented by f(X) = <W,X> + b, where W is a vector of weights.
The type IV error function, also known as epsregression, was used, and the Kernel function was a radial basis function (RBF). Kernel functions provide an alternative solution by projecting data into a space with high-dimensional features to increase the computational power of learning machines, making it possible to represent nonlinear phenomena (Granata et al., 2016). This procedure was performed in the software environment R, version 3.4.1, using the package e1071 (Meyer et al., 2019).
The trained ANNs were Multilayer Perceptron (MLP) networks, consisting of an input layer, an intermediate layer, and an output layer. The algorithm used was resilient backpropagation, where the learning rate was set automatically by the package neuralnet, with values ranging from 0.01 to 1.12. The number of neurons in the intermediate layer was chosen using the k-fold. This methodology randomly subdivides the database into k subgroups (Wong et al., 2017). The k value was 10 subgroups, with 90% for training and 10% for testing (Diamantopolou, 2010), applying cross validation. Different numbers of neurons, ranging from 1 to 20, were tested.
The activation function used was logistic (or sigmoid), with an interval from 0 to 1, which limits the amplitude of outputs and inputs. Therefore, the data were normalized, which consisted of transforming the values of each variable into values ranging from 0 to 1, using equation (10) (Soares et al., 2011). This equation considers the minimum and maximum value of each variable in the value transformation, maintaining the original data distribution (Valença, 2010), where x': normalized value; x: original value; x min : minimum value of the variable; x max : maximum value of the variable; a: The maximum likelihood ratio test (MLRT) (5) consists of comparing models pairwise, calculating its value as the difference between the values of its likelihood functions (Pinheiro and Bates, 2000), where ln is the Napierian logarithm, is the value of the maximum likelihood function of model 2, and is the value of the maximum likelihood function of model 1.

[5]
To compare the models, through the likelihood ratio test -Vuong's T RLNN (1989) (6), the statistical equation stated below was used, where is an estimator of the variance of and is the likelihood ratio test. The statistic has, asymptotically. [6] [7] [8] [9] CERNE DANTAS et. al lower limit of the normalization range; and b: upper limit of the normalization range.
After the initial fit of the Schumacher and Hall nonlinear model, the mixed models were fitted by including multilevel random effects and considering 14 structures of the variance and covariance matrix. Table 3 presents the selection criteria for structures of the variance and covariance matrix used in this study. Among them, the structure that best fit the volumetric estimation of Eucalyptus spp. was Huynh-Feldt (H-F), which had the lowest AIC value and the highest likelihood logarithm (LogLik) value. In addition, by performing the likelihood ratio test and by assuming such a structure as an alternative hypothesis, Huynh-Feldt was compared with the other structures. [10] The stopping criterion of the ANN training process was a maximum number of 100,000 cycles, or a mean squared error less than 1%, stopping the training when meeting one of the criteria. At the end of the training, the best ANN was selected, based on the smallest mean squared error.
The data were divided into two groups, using 70% to fit the nonlinear volumetric models, in their fixed and mixed forms, to construct the SVM and to train the ANN, and using 30% to generalize of the techniques. Among the data intended for ANN training, 70% were used in the training phase and 30% in the testing phase.

RESULTS AND DISCUSSION
Multilevel mixed-effects model Table 2 reports the fitted parameters of the nonlinear model by Schumacher and Hall, showing that all coefficients were significant at the 0.05 level, according to Student's t-test. [11] [12] [13] [14] [15]  Table 3 indicates that almost all null hypotheses were rejected (p-value < 0.05), except for the Heterogeneous Compound Symmetry (HCS) (p-value = 0.563) and UNstructured (UN) (p-value = 0.632) structures, which exhibited no significant difference from HF. The results indicate that using any of these three structures of the variance and covariance matrix would be adequate for the dataset. According to West et al. (2015), the HF structure is characterized by unequal variances between management units and covariances determined by calculating the arithmetic mean of variances and by subtracting λ; λ is the difference between mean variance and mean covariance. In the HCS structure, different variances and some unequal covariances, fitted by a correlation coefficient between individuals, are applied. In the Unstructured structure, different variances and covariances are also assigned for each occasion.

MULTILEVEL NONLINEAR MIXED-EFFECTS MODEL AND MACHINE LEARNING FOR PREDICTING THE VOLUME OF Eucalyptus SPP. TREES
The HF structure was chosen considering the AIC and the LogLike.
According to West et al. (2015), variance and covariance matrix structures increase the flexibility of correlations. In any data analysis, the correct structure that is most appropriate and parsimonious for these matrices should be chosen, based on observed data and on the relationships between observations of each sample unit, in this case, the management unit, because different variance and covariance numbers should be estimated. Table 4 outlines the results from the fit (fixed parameters) of the Schumacher and Hall nonlinear mixed model using the maximum likelihood method, adopting the HF structure of the variance and covariance matrix, which shows that all parameters were significant at the 0.001 level, according to maximum likelihood's test. others evaluated, was obtained with 23041 iterations. The architecture and weights of the selected ANN with lowest error among all networks evaluated, consisting of seven neurons in the hidden layer, is shown in Figure 2.

MACHINE LEARNING
The configurations of the SVM construction are reported in Table 5. The optimization of the parameters of an SVM model is fundamental for the development of the final model with high prediction performance. Modifying the gamma and epsilon parameters in the radial basis function enhances model performance. Epsilon regulates the function, minimizing the residues, while gamma is the parameter responsible for determining the base length of the radial basis function, reducing or increasing the complexity of the search process (Cherkassky and Mulier, 1998).
The combination of the two step grid search approach and cross validation was used for the global optimization of these parameters in this work. For each combination of modeling parameters, a mean square error (MSE) was calculated and the optimal parameters that produced the smallest MSE were selected. The optimal values of epsilon and gamma were 0.2441 and 0.1535, respectively.
Regarding the approach by artificial neural network, the ANN that presented the smallest error, among the From the artificial neural network with architecture 2-7-1 an equation system was extracted to predict the individual volume of Eucalyptus spp. trees, with coefficients resulting from the weights generated by the ANN. This system was used to predict the volume of the trees that made up the database intended for validation.
Model (16) expresses the relationship between the hidden layer and the response variable, where β0 is the bias, and the other coefficients are the weights related to each neuron. Model (17) represents the activation function used in each neuron of the hidden layer, derived from the logistic model. Finally, the model (18) is the result of the relationship between the input variables and the respective hidden layer neurons, being generated a model for each neuron, where : bias; : coefficient of the model associated with neuron n; : coefficient of the model between input variable k and neuron n; z n : response of the n-th neuron of the hidden layer; w i : sum of the products between the weights and the inputs.
The coefficients of the system of equations extracted from the artificial neural network are presented in Table 6. [16] [17] [18] CERNE DANTAS et. al

Estimate quality assessment
The techniques analyzed were applied to the dataset for validation and performed satisfactorily in modeling Eucalyptus spp. trees volume, with homogeneous distributions and low residual dispersion, as shown in Figure 3.
The graphical analysis highlighted the efficiency of the nonlinear mixed-effects model with an HF structure of the variance and covariance matrix and the machine learning techniques, still little spread in the Brazilian forest sector, which indicates the potential of using these techniques, due to the high gain in prediction accuracy.
The inclusion of variability between and within each individual and between management units in the nonlinear mixed-effects regression model provided better performance than the nonlinear fixed-effects regression model. It is verified that the nonlinear mixed-effects regression model was considerably more accurate than the nonlinear fixed-effects regression model, with well-distributed residuals and with a mean around zero. The lowest values of log volume tended to exhibit a higher error, albeit within the range of -10 to 15%. Conversely, the nonlinear fixed-effects model, although it presented all significant parameters at a level of 0.05, by the t-Student test, it exhibited high dispersion and greater heterogeneity of residual variance, with a tendency to overestimate the lowest values of log volume and to underestimate intermediate values. Residues of the nonlinear fixed-effects model were generally between -30 and 20%, with a value of -46%.
Thus, nonlinear mixed-effects models are important techniques for growth and production modeling. These models explain the different degrees of hierarchy within a dataset and can provide individual predictions specific to each hierarchy (Temesgen et al. 2008;Ou et al., 2016). These models also provide information from various sources of heterogeneity and correlations that are present in the data (Hall and Clutter, 2004), making them an efficient option for those interested in forest volume and biomass estimates.
Comparing machine learning techniques, according to the graphs, both showed residuals within the range of -20 and 10%. However, the plots show that the ANN was slightly better than the SVM, which presented more concentrated residues around zero, while SVM showed a higher tendency of error in the smallest and largest volume values.
The performance evaluation criteria for the techniques analyzed in this study are reported in Table 7. The qualities of the predictions performed by the techniques in the data intended for validation were evaluated. Table 7. Performance evaluation criteria of the predictions of nonlinear fixed-effects (MNL) and mixedeffects (MNLM) models, support-vector machine (SVM), and artificial neural network (ANN) The predictions of the techniques analyzed in the validation phase were strongly correlated with the  The lower the RMSE is, the higher the accuracy of the estimates will be, and the optimal situation is when RMSE is zero. Bias indicated slight overestimation trends for nonlinear models and underestimate for machine learning techniques. The nonlinear mixed-effects model presented the lowest bias value, -1.2371%, indicating that it is a balanced (non-biased) and effective tool. The mean relative error was greater than 13% for the nonlinear fixed-effects model and less than 7% for the other techniques, with the lowest value presented by the nonlinear mixed-effect model, 2.8035%. One advantage of random-effects models over fixed-effects models is a reduced residual standard errors. Calegario et al. (2005) studied the basal area of Eucalyptus spp. clones and observed a decreased of approximately 15 times. In this study, a marked decrease in residual standard error was also observed, from 0.0142 in the nonlinear fixed-effects regression model to 0.0003 in the nonlinear mixed-effects regression model, that is, a 53-fold decrease. Carvalho et al. (2011), by applying mixed-effects modeling in basal area and volume prediction, found a decrease in error from 15% to 12% in basal area prediction and from 26% to 4% in volume prediction.
The ANN was able to explain almost all variation in the Eucalyptus spp. trees volume with the available variables. Several studies have shown satisfactory performances of artificial neural networks (Özçelik et al., 2013;Vendruscolo et al., 2015;Martins et al., 2016). This superiority results from the ability of neural networks to detect implicit information and nonlinear relationships between the response variable and the explanatory variables provided as examples and to generalize the assimilated knowledge to an unknown dataset.
It should be noted that despite performing worse than the ANN, the SVM was highly efficient at estimating Eucalyptus spp. trees volume. The SVM has the advantage over the ANN of requiring no evaluation after its construction, which is performed in the ANN to select the best network, thanks to the quadratic optimization that occurs during the SVM training (Yang et al., 2015). This optimization provides the same result for each system configuration, whenever applied to the same dataset. Conversely, the ANN has more elements for manipulation, in addition to the random initialization of neural parameters (Haykin, 2001). Thus, each trained network will exhibit small differences in estimates, even when maintaining the same architecture. These differences highlight the practicality of SVMs over ANNs by preventing operator subjectivity in having to choose the best network to apply to the database.
Both machine learning and nonlinear mixedeffects model approaches were efficient in modeling the log volume of Eucalyptus spp. trees. The small variation in Eucalyptus spp. trees volume unexplained by the study variables results from the various factors disregarded in the present study that are known to affect tree volume variability in forests, such as biotic and abiotic factors and their interactions (Tanaka et al., 2017).
The nonlinear Schumacher and Hall multilevel mixed-effects model with an HF structure of the variance and covariance matrix was the most efficient in modeling the log volume Eucalyptus spp. clones because this type of model estimates fixed effects, predicts random effects and estimates variance components, considering the variability of each tree and among the different management units studied. Morphological alterations that occur between individuals, along with differences between management units caused by climate factors and other environmental factors, require that separate equations be used to make estimates, as tree development may vary from location to location and from region to region.
Studies have indicated the importance and the gain in precision generated by the inclusion of random effects in the modeling of forest structures. Ou et al. (2016) concluded that the addition of topographic variables (elevation, slope, and appearance) as a randomeffect for a nonlinear mixed-model using height and DBH as predictors improved the AIC and BIC values. Huff et al. (2018) analyzed the performance of nonlinear mixedeffect models to predict total above ground biomass and the results showed that the inclusion of shrub species as a random-effect provided better performance than nonlinear fixed-effects models.
Many forest management decisions are based on knowledge of wood volume availability in a stand and growth and productivity projections, and the use of nonlinear mixed-effects models can be employed

CONCLUSION
The present study considerably improves the modeling of the log volume of Eucalyptus spp. trees, using nonlinear multilevel mixed-effects model and machine learning. The techniques performed satisfactorily, and the nonlinear multilevel mixed-effects model by Schumacher and Hall with an Huynh-Feldt structure of the variance and covariance matrix more accurately predicted the log volume of Eucalyptus spp. trees than the fixed-effects regression model alone, the artificial neural network and the support-vector machine. The ability to explain various sources of heteroscedasticity found in the data through random effects makes the nonlinear mixedeffects model an efficient option for Eucalyptus spp. trees volume prediction, and its application is recommended due to the expressive gain in precision.