Modeling of stem form and volume through machine learning

Taper functions and volume equations are essential for estimation of the individual volume, which have consolidated theory. On the other hand, mathematical innovation is dynamic, and may improve the forestry modeling. The objective was analyzing the accuracy of machine learning (ML) techniques in relation to a volumetric model and a taper function for acácia negra. We used cubing data, and fit equations with Schumacher and Hall volumetric model and with Hradetzky taper function, compared to the algorithms: k nearest neighbor (k-NN), Random Forest (RF) and Artificial Neural Networks (ANN) for estimation of total volume and diameter to the relative height. Models were ranked according to error statistics, as well as their dispersion was verified. Schumacher and Hall model and ANN showed the best results for volume estimation as function of dap and height. Machine learning methods were more accurate than the Hradetzky polynomial for tree form estimations. ML models have proven to be appropriate as an alternative to traditional modeling applications in forestry measurement, however, its application must be careful because fit-based overtraining is likely.


INTRODUCTION
Precision and accuracy in the quantitative analysis of measuring individual volume of forest with commercial purposes are crucial; and cubing is the most used method for this purpose.Since the quantification by cubing is expensive, due to the time and cost spent, indirect methods such as sharpening functions are usually applied for its estimation.Functions of tapering, volume equations and form factor are commonly developed to assess the merchantable tree volumes (Clutter 1980), and are increasingly used in recent global studies of forest productivity (Liang et al. 2016).
As alternative to modeling usually applied in the forest area, mathematical techniques such as Artificial Intelligence (AI) appear as option for application in the field of estimates.The artificial intelligence has a subdivision known as machine learning (ML), which is based on the principle of ANA B. SCHIKOWSKI et al. inductive reasoning, which is based on the process of function approximation from the knowledge acquired (Faceli et al. 2011).
The nearest neighbor algorithm is the simplest of all ML algorithms, and is based on the classification or estimation of a particular attribute considering the distance of k nearest neighbors (Faceli et al. 2011).It is also instrument to estimate non-existent values in databases, besides being versatile, flexible, and simple and not having to meet the regression assumptions (Sanquetta et al. 2015b).
The RF algorithm composes a group of randomly trained regression trees, dividing explanatory variables according to their equality and inequality (Wang andWitten 1996, Breiman 2001).However, it has the disadvantage of difficult understanding about its predictive capacity and also about the impossibility of analyzing parameters (Moreno-Fernández et al. 2015).
ANNs are computational models based on the functioning of human brain, Haykin (2001) defines ANN as a processor structured by parallel distributed and interconnected units aiming at archiving knowledge, and then generalizing to an unexplored database.
The Random Forest technique has limited applications in forest research, but has seen increased applications in recent studies (Liang et al. 2016).In comparison, the k-NN are mostly used in remote sensing studies, since ANNs are more widely diffused in forest modeling in comparison to the other algorithms cited (Tatsumi et al. 2015, Were et al. 2015).
Realizing that such methods still require baselines and basic studies for the forest area, it is indispensable to evaluate them, in order to progress the efficiency in the forest modeling, focusing on the reliability of population estimators.Therefore, the aim of the present work is to evaluate the accuracy of ML, Random Forest, ANN and k nearest neighbor techniques and to compare them to a traditional volumetric model and a tapering function for Acacia mearnsii De Wild in order to evaluate the probability of predictions improvement through machine learning techniques.

MATERIALS AND METHODS
Acacia mearnsii De Wild., popularly known as acacia negra (black wattle).Plantations are located in the state of Rio Grande do Sul (RS), segmented in three regions: Piratini, Cristal and Encruzilhada do Sul.
The planting areas have humid subtropical Cfa climate classified according to Köppen.Rains are distributed throughout the year, but with greater occurrence in summer.Average temperature in the coldest month is less than 18 ºC and higher than 22 ºC in the hottest month, with few frosts and hot summers (Brasil 1992).
We sampled four stands in each of the regions so as to include all crop rotation (10 years).In Cristal 1, 2, 5 and 10-year old stands were sampled; 1, 2, 5 and 9-year old stands in Piratini, and 1, 3, 5 and 10-year old stands in Encruzilhada do Sul.
In each stand, four circular plots with 10 m diameter were randomly installed; and tree diameters were measured at 1.30 m above the ground.
We used 60% of database for training of algorithms and 40% for the models validation.This separation was elaborated by considering the fraction of trees by age class.
Fitting Shumacher and Hall (1933) model (Equation 1) was performed using model and reg tool in the software SAS (2002), in order to confirm regression assumptions.(1) study.We repeated this process 10 times, applying a different distribution at each time and fitting the model during the tests.Age, dap, total height and relative height were used as explanatory variables for diameter estimate along the stem.
ML models were trained to obtain the volume also with the variables age, dap, total height and relative height, but logic of accumulated volume at a given relative height was used, according to Equation 3.
where: vi = stem volume at height i (m³); g i = crosssectional area of wood pieces measured (m²); L i = length of wood pieces measured in the cubing (m); Nt = number of wood pieces.
K NEAREST NEIGHBOR K nearest neighbors of each instance are defined according to the metrics selected, Equation 4 shows the Euclidean distance, which is commonly used for the algorithm application (Witten et al. 2011).In order to eliminate the influence of the variables scale, they were normalized according to Equation 5.
where: d=Euclidean distance between two points p(p 1 , p 2 ,...p n ) and q(q 1 , q 2 ,...,q n ) p and q = different individuals with n attributes or explanatory variables.min (max min ) where: a i = normalized instance; v i = current value of instance i to be normalized; max v i and min v i = maximum and minimum values of attribute i.
where: d i = diameter i along the stem (cm); dap = diameter at 1.30 m above the ground (cm); h i = height i along the stem (m); ht = total height (m); β n = model coefficients; p n = exponent selected to the model; ε i = random error inherent to the regression.
Volume was estimated by the Poly Hradetzky Vi formula, incorporated into Florexel (Arce et al. 2000).
We chose stratification based on the diameter class.For fitting volume and tapering function, data were divided into 3 classes: 5 cm bellow the dap; 5 to 10 cm and over 10 cm.
Three machine-learning algorithms were analyzed: k nearest neighbor (k-NN), Random Forest (RF) algorithm, and Artificial Neural Networks (ANN).
All algorithms were trained in the Weka 3.7.12software (Hall et al. 2009), using the 10-group cross-validation test, which is based on dividing the test set into 10 groups of approximately equal size.According to Faceli et al. (2011), it consists in the use of 9 groups in the predictor training for later test with the remaining group in case of this ANA B. SCHIKOWSKI et al.Aha (1992) established the concept of attribute weighting, thus avoiding bias in the estimation that noise data can induce.Thus, the instance to be estimated is obtained according to its distance from the test examples (Equation 6). where: q f(x ) = unknown value instance to be estimated; f(x i )= observed instances used as basis for estimation; w k = weighting factor or weight relative to the k neighbor; k = number of neighbors used in the prediction.
In this work, weighting of k nearest neighbors was standardized as the inverse of their respective distances (1/d).This method is recommended by Bradzil et al. (2003) and Sanquetta et al. (2013).
Thus, input configurations for this algorithm in the Weka software were: 20 near neighbors with activated, cross-validation weighting of distances by 1/d and linear search by the near neighbor using the Euclidean distance.

RANDOM FOREST
The RF algorithm is limited in its input parameters.Default settings of Weka software were maintained, except for the number of trees to be created, since the program suggests the structuring of 100 regression trees, but Were et al. (2015) suggest that higher numbers may provide more stable results.Thus, 1,000 regression trees were constructed.

ARTIFICIAL NEURAL NETWORKS
The Weka software uses sigmoidal logistic function as activation (Equation 7), varying from 0 to 1, being then required that data to be normalized to that scale.
( ) The following input parameters were provided to the Weka for ANN training: momentum 0.4 and number of times 1,000, all as fixed terms.
Regarding the learning rate and the number of neurons in the hidden layer, these were obtained using the CV Parameter Selection tool available in Weka, where it is possible to optimize parameters of algorithms.For some parameters, we used literature values due to the high computational demand.
We used only one hidden layer, since according to the Universal Approximation Theorem, only one hidden layer is enough for a MLP network to approximate any continuous function (Haykin 2001).

MODELS EVALUATION
Results obtained by models were evaluated according to the Pearson correlation (r) between the observed and predicted values, square root of the mean error (SRME%) in percentage, graphical analysis of absolute residuals and frequency histogram of percentage errors by classes in order to ratify the ranking decision, as well as identifying possible trends along the estimate line.
Additional statistics were used to complement taper models' accuracy test, which are based on tests on the residues according to the methodology indicated by Parresol et al. (1987) andFigueiredo Filho et al. (1996).
Among them were used: deviation (D), which indicates the existence or not of tendencies between residues; Residues percentage (RP) showing the error amplitude; SQRR that relates the size of each residue to its real value, and standard deviation of differences (SD) that shows the homogeneity among residues (Parresol et al. 1987).
Based on the seven statistics cited, models were organized in order to determine which one obtained the best performance.This evaluation was performed by assigning scores, being the smallest score given to the best model of aforementioned statistic.Therefore, the one with the lowest sum will be considered with the best performance.

RESULTS AND DISCUSSION
Table I shows coefficients fitted for Schumacher and Hall model, all coefficients being significant.Fit indicators such as the correlation between observed and predicted values and SRME% show that models are indicated for application in volume estimation.Trees below 5 cm dap were not considered in the analysis because of Weka program restriction in the fit of AI models.
Table II shows statistics for models evaluation, both for the test set used in training and fit, and for the validation set.
All models provided timely statistics for their use, with correlation coefficients of observed and estimated values above 0.99 and low SRMEs.As expected, test suite statistics prove more accuracy.As for the ranking of models for the training set, the three models of machine learning were more accurate than Schumacher and Hall.However, when we evaluate the validation set, we can see that these models show statistics more unstable, that is, Schumacher and Hall, and ANNs show very similar statistics in both groups, while the others do not.
The nearest neighbor algorithm (k-NN) was the best model in the test set, but it did not achieve the same success in the validation set, although its error statistics point to a technique good performance   (Figure 1), where bias in the test set is extremely subtle and very close to zero.On the other hand, both the Schumacher and Hall model and ANN (when properly trained) have the characteristic of modeling, although ANN is less explicit and more flexible due to the high number of neurons.
Figure 1 also shows that models did not present extreme residues, with their respective error frequencies between ± 20% center classes.It should be noted here a slight tendency of ANN to underestimate volumes, especially those of smaller magnitude.
Silva et al. ( 2009) states that ANNs have similar or superior accuracy to the regression models and that it still has the advantage of generalization and plasticity that allows the use of only one network to predict the volume of trees from different sites and clones.
Gorgens et al. ( 2014) report promising results with the use of ANNs trained with the back propagation algorithm, suggesting that the network is constructed with more than 10 neurons in the first layer, as well as recommending the use of more than one intermediate layer.However, the network optimization did not obtain a complex model as the best result in this work, since the network considered as the one with the best performance has 3 neurons in the intermediate layer.2010) follow the same result line as the simplicity of ANN trained with intermediate layer and 2 or 3 neurons.They emphasize as advantage that ANNs can assimilate relations between variables through connections of their respective weights, enabling networks to model systems with complex nonlinear relationships.

Özçelik et al. (
Table III shows coefficients and their respective exponents selected by the stepwise method, all coefficients being significant at 95% probability.Fitting indicators such as the correlation between observed and predicted values and SRME% demonstrate that models are suitable for use in estimating stem form.The model has greater difficulty in expressing form in individuals in the class of dap smaller than 5 cm.
Table IV shows evaluation statistics of models, both for the test group used in training and fit, as well as for the validation group.
All models provided favorable statistics for their use, with correlation coefficients higher than 0.98 and maximum SRME of 10%.In turn, SSRR statistic showed greater discrepancy between models due to their quadratic nature.As expected, the test suite statistics demonstrate greater accuracy.
Figure 2 shows that models did not present extreme residues, with their respective frequencies of errors in ± 20% center classes.

MODELING OF TREES WITH MACHINE LEARNING 3397
Regarding the models ranking, as well as in the estimation of the volume as function of dap and height, it is evident that even though cross validation has been applied in the training of AI models, these models may not necessarily affect good fit in the validation group.Thus, repeating here the trend of the nearest neighbor algorithm (k-NN), as the best model in the test set and worst model in the validation set.
Leite et al. ( 2011) compared the performance of three types of ANNs: linear perceptron, multiple layer perceptron and radial base function, with the second-degree polynomial taper function, established as Kozak's model (1969).Authors point out that ANN models showed better fit to the database, with low values of SRME% and concentration of residues about ± 10%.However, like the present research, the models showed difficulties in the estimations of smaller diameters located in the stem's terminal portion with strong tendencies to overestimate.Soares et al. (2011) reported ANNs' good performance in the estimation of relative diameters trained with the same methodology of this work, with correlation coefficients between 0.97 and 0.99, as well as SRME% values of 7% on average.As a differential, they also point out a good accuracy of the ANNs in the prediction of recursive diameters, that is, with only 3 measures of the base in the generalization step (validation) of the model and subsequent estimation of the others.Soares et al. (2012) still propose the use of ANNs to study the form without previous knowledge of the total height, with correlation coefficients between 0.95 and 0.99, as well as mean SRME% between 1% and 20%, considering a stratification of individuals by dap class.
Table V shows evaluation statistics of the models, both for the test group used in training and fit, as well as for validation group.All models showed correlation coefficients higher than 0.98, but with SRME% values higher than those obtained in the relative diameter estimates, indicating greater variability in estimates.
Regarding the ranking, the RF model was more accurate for the training set, followed by ANN.In turn, ANN proved to be the best performing model in the validation set, followed by the Hradetzky polynomial.
Figure 3 shows the residual graphs for total volume, showing that k-NN model presented greater bias for both the training set and validation  2014b) tested a method for obtaining total volume estimates with and without bark for Eucalyptus sp. using ANNs.Thus, networks were trained with clone, dap, total height and diameters related to heights 0.5, 1, 2 and 4 m, concluding that this methodology can be applied as alternative to reduce diameter measurements throughout the stem, aiming at the cubing of standing trees.Sanquetta et al. (2015a) have tested k-NN models performance for estimating volume of stalks of Bambusa sp., combined with fifth-degree polynomial taper functions and Hradretzky's fractional exponents, besides other techniques for estimation of stem volume.These authors consider that satisfactory results were not obtained with k-NN models and attribute the poor performance in the estimation to the reduced number of database.Also noting that using this technique in this case would only be indicated if linear regression premises were violated.
The Knn algorithm is versatile, simple and has great plasticity in the modeling of complex relationships.It is an important tool to estimate missing values and to explore local deviations in databases (Eskelson et al. 2009, Fehrmann et al. 2008).When compared to the traditional methods of regression, Knn algorithms has the disadvantage of not having well-studied statistical properties.Moreover, the sample size can be a limiting to accurate is preferred (Mognon et al. 2014, Haara andKangas 2012).
The Random Forest technique does not need assumptions about data distribution, because it has high capacity to model complex interactions among a large number of predictive variables and demand no overtraining to the database (Prassad et al. 2006).et al. 2007).
RNAs have the ability to model non-explicit characteristics among variables.In comparison to traditionally applied models, they have the advantage of generating estimates for different strata with a single model, besides the possibility of including categorical variables in their models (Gorgens et al. 2009, Silva et al. 2009and Binoti et al. 2014a, b).The RNAs still have great adaptability, tolerance to outliers and ease of application after network training (Leite et al. 2011).As a disadvantage, they have difficulties in choosing the network configuration, requiring high computational capacity and demanding care to avoid overtraining in relation to the database.
On the other hand, regression analysis has a consolidated theory of wide application.It has the possibility of obtaining different variables of interest, being less demanding in numbers of observations with the advantage of interpreting parameters.However, the predefined curve behavior makes it difficult to properly adjust to different databases.The advantage of Artificial Intelligence remains in the ability to identify relationships between variables that are too complex for parametric models (Strobl et al. 2009).

CONCLUSION
Schumacher and Hall model and ANN showed the best results for volume estimation depending on dap and height when compared to Random Forest and k nearest neighbor.
Artificial intelligence methods used prove to be more accurate than the Hradetzky polynomial for tree form study estimates, such as the diameter along the stem and total volume.ML models are appropriate as alternative in traditionally applied modeling in forestry measurement; however, their use should be cautious because of the greater possibility of fitbased overtraining.

Figure 1 -
Figure 1 -Relative residues dispersion and frequency of percentual residues by error class for estimation of total volume in function of dap and total height.

Figure 2 -
Figure 2 -Relative residues dispersion and frequency of percentual residues by error class for diameter estimation along the stem.

TABLE II Fit statistics for estimation of stem total volume for training (1) and validation (2) sets. Set Model
r -correlation coefficient between observed and estimated values; RSME -Absolute Root Mean Square Error; RSME% -Percentual Root Mean Square Error; D -mean deviation; |D| -mean absolute deviation; SD -standart deviation of the differences; SSRR -Sum of the relative square residues; RP -percentual residues.ANA B. SCHIKOWSKI et al.

TABLE IV Fit statistics for diameter estimate along the stem fortraining (1) and validation (2) sets.
coefficient between observed and estimated values; RSME -Absolute Root Mean Square Error; RSME% -Percentual Root Mean Square Error; D -mean deviation; |D| -mean absolute deviation; SD -standart deviation of the differences; SSRR -Sum of the relative square residues; RP -percentual residues.ANA B. SCHIKOWSKI et al.
Figure 3 -Relative residues dispersion and frequency of percentual residues by error class for estimation of stem total volume.

TABLE V Fit statistics for estimation of stem total volume for training (1) and validation (2) sets.
SSRR -Sum of the relative square residues; RP -percentual residues.ANA B. SCHIKOWSKI et al.However, RF is not a tool for traditional statistical inference; therefore, it is not suitable for ANOVA and hypothesis tests.It does not compute p values, regression coefficients, or confidence intervals.The lack of a mathematical equation or graphical representation may hinder its interpretation (Cutler r -correlation coefficient between observed and estimated values; RSME -Absolute Root Mean Square Error; RSME% -Percentual Root Mean Square Error; D -mean deviation; |D| -mean absolute deviation; SD -standart deviation of the differences;