Estimation of eucalyptus tree height... ESTIMATION OF EUCALYPTUS TREE HEIGHT IN CLONAL AND PROGENY TESTS USING ARTIFICIAL NEURAL NETWORKS

1 Received on 18.04.2016 accepted for publication on 28.06.2017. 2 Universidade Federal de Viçosa, Programa de Pós-Graduação em Ciências Florestal, Viçosa, Minas Gerais – Brasil. E-mail: <anaflorestaufv@gmail.com> and <araujo.91@hotmail.com>. 3 Universidade Federal de Viçosa, Graduado em Engenharia Florestal, Viçosa, Minas Gerais – Brasil. E-mail: <ra.barreto.so@gmail.com> and <filipema@gmail.com>. 4 Duratex, Agudos, São Paulo – Brasil. E-mail: <raul.chaves@duratex.com.br>. 5 Universidade Federal de Viçosa, Departamento de Engenharia Florestal, Viçosa, Minas Gerais – Brasil. E-mail: <hnpaiva@ufv.br> and <hgleite@ufv.br>. 6 Universidade Federal de Viçosa, Doutor em Ciência Florestal, Viçosa, Minas Gerais – Brasil. E-mail: <danielhbbinoti@gmail.com>. *Corresponding author.


INTRODUCTION
The evaluation of progenies and clones is the most time consuming and costly stage of forest improvement.Planning the experiments involves processes such as defining their design, location, number of families, number of blocks, data to be collected, and analysis methods (Ramalho et al., 2005).For plants with long productive cycles, such as eucalyptus, this evaluation should be done as efficiently as possible, because erroneous inferences could lead to the loss of years of work.According to Reis et al. (2011), these tests help in the formation of more homogenous and more productive stands.
Clonal tests are conducted in the last stage of forest improvement, with the selected genotypes being planted on a commercial scale or in pilot tests (Araujo et al., 2015).In these tests, the heights of all trees that make up the useful area of the plots are measured (Santos et al., 2006).Likewise, in progeny tests, the heights of all trees are measured.When several measurements are taken, it results in a relatively large number of heights measured along a rotation.These heights are used both in early selection (Pinto et al., 2014), and selections made at the end of the experiment (Santos et al., 2006), usually close to the regulatory rotation used by the company.
In a forest inventory, instead of measuring the heights of all trees in a plot, hypsometric relationships are used.The models are adjusted using data for the measured heights and diameters of some trees present in the plots.The resulting equations are applied to the other trees (Campos and Leite, 2013).
An alternative approach for modeling hypsometric relationships involves the application of artificial neural networks (ANNs).Binoti et al. (2012) trained an ANN for different percentages of height measurement reductions in a plot, verifying the possibility of measuring the height of only 10% of the trees in the inventory plots without losing precision in the estimates.Other studies have demonstrated the efficiency of using ANN for predicting tree heights in forest inventory plots (Binoti et al., 2013;Vendruscolo et al, 2015, Ozçelik et al., 2013), mapping natural forest biomass (Schoeninger et al., 2009), classifying satellite images (Andrade, 2003), estimating stem shapes (Schikowski et al, 2015), estimating the nutritional efficiency of eucalyptus leaves (Lafetá, 2012), and predicting growth and production (Alcântara, 2015), among other applications.
Based on the results obtained for inventory height estimation, it is possible to understand the importance of evaluating ANN efficiency for research experiments in which the number of tree heights measured annually is relatively high.Therefore, this study aimed to evaluate the efficiency of an ANN used to estimate tree heights in clonal and progeny tests, and how this method reduced time and costs relative to a method of measuring the heights of 100% of trees in experimental plots.

Data Description
The data used in this study were obtained from clonal and progeny tests with eucalyptus (conducted by forest companies), involving a complete rotation of approximately six years.
To train the networks to estimate heights in clonal tests, 114 treatments were used.Each treatment contained six blocks, for a total of 8,329 data points collected for six tree ages.For the progeny tests, 215 treatments were used as the basis for training the networks, each containing ten blocks, totaling 36,793 data distributed across five ages.Descriptive analyses for both tests are presented in Table 1.

Artificial neural networks:
To train and validate the ANN used for the clonal and progeny tests, a tree from each block was selected.In sub-sample 1, the first tree of the block was selected, while in sub-sample 2, a tree was selected randomly in each block.These sub-samples were separated, with 70% used for training and 30% used for network validation.The other unselected trees were used for ANN generalizations.
The trained networks were of the Multilayer Perceptrons (MLP) type, in which two layers (intermediate and output) process data, while only the input layer receives data (Haykin, 2001).The categorical variables used in the input layer were age, treatment, and block.We used dap as a continuous input variable.
To train the networks, we used the application Neuroforet 3.3 and a resilient propagation (RPROP+) type algorithm, which adapts weight updates according to the behavior of the error function (Riedmiller;Braun, 1993).As a criterion for stopping training, the following parameters were adopted: number of cycles equal to Estimation of eucalyptus tree height... 3000, or average error equal to 0.0001.Numerical data were normalized on a 0 to 1 scale.These parameters were defined based on previous studies of the estimation of the height of eucalyptus trees in forest inventory plots (Silva, 2012).The chosen ANN was evaluated for each combination of age and treatment, based on the Kolmogorov-Smirnov (K-S) test for the normality of residuals.Training, validation and generalization estimates were grouped so that the treatments were complete in relation to the number of trees sampled.Because of the large number of treatments in both tests, the p-values were grouped in intervals, and the percentage of treatments with values higher than 0.05 was defined.Thus, the greater the number of treatments with residual normality, the more accurate the network was at estimating the tree heights for different treatments.

3.RESULTS
Considering the two sub-samples, most of the training estimate errors in both tests varied by an average of 5%, with a trend of non-normality in the residuals, as observed in the histogram graphs in Figures 1 and 2.
The two methods of selecting trees within a block for training the network were compared, based on the generalization and normality statistics of the residuals for each treatment.Histogram plots for the ANN generalization estimates in the clonal test for sub-samples 1 and 2 showed 89% and 87% of residuals, ranging from an average of 10% (Figure 1) respectively.In the progeny test, 74% and 73% of the residuals obtained by estimates of the trained network generalization with sub-samples 1 and 2 varied by an average of 10% ^ respectively (Figure 2).The validation estimates were similar to the generalizations in both tests and subsamples (Figures 1 and 2).
Table 2 shows the p-values of the K-S test for verifying the normality of residuals for the combination of 1) age and treatment, and 2) sub-sample and test,  Estimation of eucalyptus tree height... grouped into intervals of 0.05.The normality of residuals was verified in 99% of clonal test treatments, when subsample 1 was used to train the networks.The random selection of data for network training was less precise, with 97% of the treatments presenting normality in the residuals.In the progeny test, the normality of residuals was observed in 99% of the treatments in both sub-samples.
Table 3 shows the time taken to measure the heights in the database, represented by days saved (based on an 8 h working day).

4.DISCUSSION
For both tests, the errors varied by an average of 10%, showing that the results generated by ANN were satisfactory.Both the relatively high correlation estimates between the data sets and the high percentage of treatments with normality in the residuals show that the networks were able to estimate tree heights in the experimental plots with sufficient accuracy for most of the treatments and all of the age ranges present in the database.The efficiency of neural networks for height estimation has already been demonstrated for forest inventories (where several categorical, edaphic, physiographic, and climatic variables are included) and forest inventory records (Binoti et al., 2012;Ozçelik et al., 2013).
When analyzing the results of the normality tests, we verified that for a few treatments, the network did not accurately estimate heights.The highest errors were found for trees of lower heights, increasing the efficiency of the method, because these types of  pruning, volume diameter, etc. (Ferreira, 1992).
The highest treatment percentages in clonal tests with normality in the residuals were observed in the estimates obtained by networks trained with the first tree of the block (sub-sample 1).In the progeny test, there were no differences in terms of ANN training between selecting the first tree of the block or selecting a random tree.Therefore, we recommend that the first tree of each block be measured, so that the operating procedure is simpler for operators.
Given the efficiency estimating heights in experimental plots using ANNs, it is possible to analyze the time gains produced by measuring the height of only one replicate in each block for each genotype (20% of the data).This means that individual estimation would no longer be needed for 6,663 trees in clonal tests and 29,434 trees in progeny tests.
Considering that on average, a measurer spends 1 min measuring and recording the height of a tree (without considering worker displacement), this means a time savings of 6,663 min (or 111.05 h or 13.88 working days (8 h/day)) when measuring tree heights in clonal tests.In progeny tests the time savings are even greater, with savings of 29.434 min (or 490 h or 61.32 working days (8 h/day)) are saved.These time and labor savings represent a significant reduction in the time required to measure experimental plots, which would lead to a significant reduction in research costs.
When training and applying neural networks for a specific year, age could be eliminated in the input layer.Another potential application that could be tested by users, using the configuration proposed in this study, is to train networks with one year of data and apply the process to another year, generalizing the heights of all trees on the second occasion.This application might be necessary if in a given year, for economic reasons, it is necessary to significantly reduce research measurement costs.This alternative was not tested in the present study, because the aim was to construct, propose, apply, and demonstrate the efficiency of neural networks in estimating height in the case of experimental measurement.We found that by measuring only one height per replicate, it is possible to accurately generalize the heights of other trees with accuracy.

5.CONCLUSIONS
Artificial neural networks are efficient at estimating height in clonal and progeny tests.They allow a reduction in working time in the measurement of experimental plots, and consequently a reduction in research costs, without losing measurement accuracy.
By measuring the height of only the first tree of each treatment replicate (genotypes), in clonal and progeny tests, the heights of the other trees can be estimated with sufficient accuracy.

Five
networks were trained, and the best network was selected based on the following statistics, calculated for each set (training, validation, and generalization): correlation between observed heights and estimated correspondents (r yy ), standard deviation of mean percentage error ( ), and histogram of residuals, calculated as follows: Where y i e y i = observed and estimated values of the variable under analysis; y m e y = estimated and observed mean value; n = number of cases; S 2 = sample variance.The histogram graphs for the training, validation, and generalization of the networks were analyzed, considering a range of class of 5%.

Figure 2 -
Figure 2 -Standard error, correlation between estimated and observed values, and histogram graphics generated with estimated data in the training, validation, and generalization of the progeny test with sub-samples 1 and 2. X-axis shows the residue classes by percentage, y-axis shows the observed frequencies for each class.Figura 2 -Erro padrão, correlação entre os valores estimados e observados e gráficos de histograma das estimativas geradas pela RNA no treinamento, validação e generalização do teste de progênie com as sub-amostras 1 e 2. No eixo x dos gráficos de histograma são apresentadas as classes de resíduo em %, no eixo y as frequências observadas em cada classe.

Figure 1 -
Figure 1 -Standard error, correlation between estimated and observed values, and histogram graphics generated with estimated data in the training, validation and generalization of the clonal test with sub-samples 1 and 2. X-axis shows the residue classes by percentage, y-axis shows the observed frequencies for each class.Figura 1 -Erro padrão, correlação entre os valores estimados e observados e gráficos de histograma das estimativas geradas pela RNA no treinamento, validação e generalização do teste clonal com as sub-amostras 1 e 2. No eixo x dos gráficos de histograma são apresentadas as classes de resíduo em %, no eixo y as frequências observadas em cada classe.

Table 1 -
, Descriptive analysis of clonal and of progeny test data.