Application of neural networks to predict volume in eucalyptus

The aim of this study was to evaluate the methodology of Artificial Neural Networks (ANN) in order to predict wood volume in eucalyptus and its impacts on the selection of superior families, and to compare artificial neural network with regression models. Data used were obtained in a random block design with 140 half-sib families with five replications at three years of age, and four replications at six years of age, both with five plants per plot. The volume was estimated using ANN and regression models. It was used 2000 and 1500 data to train ANN, and 1500 and 1300 to validate ANN for 3 and 6 years of age, respectively. It is concluded that ANN can help improving the accuracy to measure the volume in eucalyptus trees, and to automate the process of forestry inventory and were more accurate in predicting wood volume than almost all regression models.


INTRODUCTION
The large forestry cover in Brazil, combined with excellent soils and climate conditions, points to great advantages of investments in the forestry sector in Brazil (Juvenal and Mattos 2002).Approximately 3 million of hectares of forests planted for commercial purposes in Brazil are planted with Eucalyptus genus.Investments provided by business which need raw material for energy power and for paper and cellulose, combined with efforts of government and universities' research institutes, have made Brazil one of the leading producers of eucalyptus in the world.
Clone plantations of eucalyptus hybrids can produce up to 50m 3 ha -1 year -1 .In Chile, the United States, Canada and Finland the production is 20, 10, 7 and 4 m 3 ha -1 year -1 , respectively.The total area planted with eucalyptus in Brazil is 3,549,147 hectares.Of this total, the largest producers are Minas Gerais, São Paulo, Bahia and Espírito Santo, contributing with 31%, 23%, 15% and 6%, respectively, and the other states contribute with 25% of the Brazilian production.According to the Sociedade Brasileira de Silvicultura, this species generates 16 billion dollars in exports and domestic products, in addition to 2 million of direct and indirect jobs.
Volumetric equations are used to calculate the amount of wood produced, which quantifies the productive potential of the cultivation area, and the farmer and or industry's income.Several trees are cut down so that diameter and height are measured.
Forestry measurements are an important element in forestry management, since it provides precise information on the Forest, allowing proper decision making for its accomplishment, in addition to providing the best design for its activities.The two most used variables in the achievement of forestry measurements are height and diameter at breast height (DBH), which are used for calculation of the basal area and wood volume existing in a Forest (Freitas and Wichert 1998).For reliability of a forestry inventory, it is necessary to know its sources of error in order to eliminate or to minimize their effect on the precision of the measurements.
According to Peres (1989), the errors committed when measuring diameters and heights of trees are caused by mistakes made by the operator, by problems with the tools and by observation conditions.The errors made at the diameter measurement are more important than those made when measuring the height.Moreover, an error of 1 cm in the diameter corresponds to a maximum of 19% in the LL Bhering et al. calculation of the volume; on the other hand, an error of 1m made when measuring the height corresponds to only 14% of the volume (Couto et al. 1989).
Due to the large number of trees from which DBH and height data are collected, many errors may be made at field level, harming forestry measurement, and consequently the selection of superior genotypes useful for plant breeding program.An alternative is using methodologies of volume prediction based only on the DBH, reducing the labor of field measurements, and resulting in more precise measurements.
Artificial neural network is a mathematical or computer model based on biological neural networks (Braga et al. 2007).The Artificial neural network model is a learning method supervised by training data in order to produce models, which generate predictions for response variables (Mugnai et al. 2008).
Agronomic uses for artificial neural networks (ANN) models are found in production predictions in rice (Kaul et al. 2005, Ji et al. 2007), in genetic diversity analysis (Barbosa et al. 2011) prediction of diseases using classification of satellites pictures (França et al. 2010), in breeding value prediction in animal (Ventura et al. 2012), in the selection in Alfafa (Nascimento et al. 2013) and sugarcane (Brasileiro et al. 2015).
Neural networks have been used in forestry modeling to estimate several parameters, such as diameter, height and volume of trees, among others, but in eucalyptus breeding, this methodology has not been used yet.Several authors have predicted height based on DBH, age and other statistical information using artificial neural networks (Castaño-Santamaría et al. 2013, Özçelik et al. 2013).Castro et al. (2013) used ANN for models of growth in Eucalyptus.Özçelik et al. (2010) estimated tree bole volumes of 89 Scots pine (Pinus sylvestris L.), 96 Brutian pine (Pinus brutia Ten.), 107 Cilicica fir (Abies cilicica Carr.) and 67 Cedar of Lebanon (Cedrus libani A. Rich.) trees using Artificial Neural Network (ANN).The results reported by the authors suggested that the selected cascade correlation artificial neural network (CCANN) models are reliable for estimating the tree bole volume since they gave unbiased results and were superior to almost all methods in terms of error (%) expressed as the mean of the percentage errors.Huang et al. (2009) used neural networks to find the frequency of distribution of classes of diameter and trunk, based on the relative diameter, mean diameter and coefficient of diameter variation.Artificial neural networks have presented higher performance than regression models due to many factors, such as: having massive and parallely distributed structure (layers); being able to learn and generalize, which makes them able to solve complex problems; being tolerant to failures and noises; being able to model several variables and their nonlinear relationships; being possible to model with categorical variables (qualitative), as well as to numerical variables (quantitative) (Haykin 2001).
Therefore, the objective of this work was to evaluate the methodology of neural networks for predictions of wood volume in breeding programs of eucalyptus and its impacts on selection of superior genetic material, and to compare artificial neural network with regression models.

Plant material and experimental design
One hundred and forty half-sib families of eucalyptus were evaluated.The experiments were carried out in a random block design with five replications at three years of age, and four replications at six years of age, both with five plants per plot, and with a spacing of 3 x 3 m.
The original individual volume in cm 3 was estimated based on the expression: where: π is 3.14159; DBH is the diameter at breast height (cm); height is expressed in cm; and 0.43 is de form factor usually used in Brazilian plant breeding.
The assessed traits were statistically analyzed by using the software Genes (Cruz 2013).The statistical model adopted in this study was a random block with information within the plot, and a varying number of plants among plots, by using the following model: Where: Y ijk is the observation of the k th individual assessed in the i th family in the j th block; µ is the overall mean of the trial; g i is the random effect of the i th genotype (half-sib family) [i=1,2,....g]; b j is the random effect of the j th block [j=1,2,...,b]; and ε ij is the environmental random effect among plots.
After calculating mean square, it was possible to calculated variance components as follows: Where: n is the number of plants within plot; r is the number of blocks; and MS is the mean square.Crop Breeding and Applied Biotechnology 15: 125-131, 2015 Among plot variance (σ 2 a ) (IV) A number of plants equivalent to the harmonic mean in the mean square expected values was used for estimation, since this number ranged between the different plots.According to Ramalho et al. (2000), the use of this procedure is satisfactory when the number of failures is not very large.
The mean number of individuals within each plot was achieved by the following expression: The expressions described by Vencovsky and Barriga (1992) were used to determine coefficient of heritability and variation.Heritability associated to selection of means of the half-sib families is given by: (VII) Once the genetic parameters are known, it was possible to predict gains with selection by using the expression: Where: SG is the gains with selection; h 2 is the heritability; and SD is the selection differential.

Selection differential
Selection differential was achieved by the difference between the original mean of the population and the mean of the selected half-sib families, which corresponded to 20% of the half-sib families, that is, the mean of the 14 most productive half-sib families.

Neural network
In this study, it was used multilayer neural networks.The proposed neural network has 1 input layer, 3 intermediate layers and 1 output layer.The input data was the DBH values of the 2100 plants at 3 years of age, and 1400 plants of at 6 years of age for network training.For validating the artificial neural networks, 1400 plants at 3 and 6 years of age were used.The first three blocks were used for training, and the last two blocks were used for validation.Training data was not expanded.In the intermediate layer, the number of neurons per layer varied between 1 to 10 neurons in the first layer, 1 to 20 in the second layer, and 1 to 8 and in the third layer.The output layer consisted of one neuron, and the output was the wood volume based only on the DBH estimated by artificial neural network.This value was known in the training, but unknown in the validation.The best network architecture was established by the one with higher mean accuracy, considering the 43200 possibilities, calculated by multiplying the number of neurons in each layer, and the possible activation functions (10X20X8X3X3X3).

Activation functions used were:
Linear transfer function (purelin): calculates the neuron's output by simply returning the value passed to it.
where: α is the output in each layer, and n is the input in each layer.
Tan-sigmoid transfer function (tansig): Tan-sigmoid output neurons are often used for pattern recognition problems, while linear output neurons are used for functions fitting problems.This function is used in multiplayer networks, and is defined by: log-sigmoid transfer function (Logsig): The function Logsig generates outputs between 0 and 1, as the neuron's net output goes from negative to positive infinity.This function is defined by: The three transfer functions described here are the most commonly used transfer functions for multiplayer networks, but other differentiable transfer function can be created and used if desired.
Bayesian regulation backpropagation (Trainbr) was used to train the neural network, and 1000 epochs (iterations) were used.Trainbr is a network training function, which takes into account the weight and the bias values according to Levemberg-Marquardt optimization.It minimizes a combination of squared errors and weights, and then determines the correct combination, in order to produce a network that generalizes well.The process is called Bayesian regulation.The simulation of the analyses was carried out by using software Matlab (MathWorks 2011).

Statistics analysis
It was estimated the spearman correlation and Pearson correlation between half-sib families, and it was calculated the coincidence index between 20% best families selected by the direct and indirect selection method explained below.
To evaluate the accuracy in Artificial Neural Network (ANN), ANN was compared with regression models like linear, quadratic, cubic, square root, potential, exponential, LL Bhering et al.

RESULTS AND DISCUSSION
When analyses were being carried out, configuration of the network, which adjusted better to the data, so it could be used in the validation, was reached by using different activation functions.For the three years of age data, the best performance was achieved by using four neurons in the first hidden layer, and only three neurons in the second hidden layer.Both layers presented the best performance when tansig activation function was used.A similar explanation was obtained in the analyses of the six years of age, where the first hidden layer presented the best performance when it was made up by four neural, and the second layer made up with two neural.Again, data were better explained by using tansig activation function.
After achieving the best network configuration in the training, the achieved volumes of individuals of families in each replication and plot were estimated to make it possible the comparisons among parameters estimated by original volume and parameters estimated by the volume calculated by neural network.
The use of neural networks to estimate height and volume of trees has been already reported (Silva et al. 2009(Silva et al. , Özçelik et al. 2010(Silva et al. , Özçelik et al. 2013).The information on height and DBH was used as inputs in the configuration of the adopted network in those studies.The only disadvantage in those studies is that trees with the same DBH and total height, but with different shaft presented the same volume.
Analysis of variance for the two variables (original volume and volume predicted by the network) for evaluations at three and six years of age for eucalyptus plantation with 140 half-sib families is found in Table 1.It can be concluded that the experiment has genetic variation different from zero, since the F test was significant for the variables evaluated in the evaluation times.Therefore, it is possible to use it for selection of the best families.
It is also found, in Table 1, that the estimate of mean for both variables was very close in both situations, at three years of age (0.1118 and 0.1120) and at six years of age (0.2236 and 0.2257).
The values obtained with the maximum and minimum for both prediction modes of the volume in the experiment also presented very close values.The coefficient of experimental variation (CV), which measures the experimental precision, was identical in the evaluation at three years of age (16.88 and 16.88%), and very close in the evaluation at six years of age (21.86 and 21.81%).Therefore, it shows that experimental quality was not affected after volume estimation by the technique of neural networks.
Other important pieces of information to be evaluated in a breeding program are the estimate of genetic variance and heritability, which measure how much of phenotypic variance is explained by genetic causes. 1 that values of heritability and genetic variance, whether in the first evaluation at three years of age (83.73, 82.63% and 0.000367, 0.000341), or in the second evaluation at six years of age (80.15, 79.71% and 0.002414, 0.002394) were very close.Thus, for selection purposes, there would not be gain losses in volume if it was estimated by using Artificial Neural Network.Those facts show that the network was efficient in estimating individual wood volume and the result of the analyses of variance and descriptive analyses were consistent and very close.

It is found in Table
Linear correlation of Pearson among variables revealed a positive value of high and significant magnitude (p<0.01) for both analyses at three years of age and at six years of age (Table 2).Those studies of correlation are very important in the evaluation of magnitude and direction of the relationships between both characters and gains to be obtained by indirect selection (Cruz 2006).In this study, correlation above 0.98 shows that the values observed by the network and the original volume are very close.
Moreover, Table 2 shows the values obtained from the correlation of Spearman, which evaluates the changes in the ranks of the variables.In this study, the correlation was significant (p<0.01), and with a value even greater than the one presented by correlation of Pearson, reaching 0.99 at six years of age.Thus, it confirms that the ranking of variables were not changed at a significant amount of data by using any of the variables.
The level of coincidence of material existing in a sample with 20% of the treatments, that is, with 28 superior and 28 inferior half-sib families was calculated.For both ages and for both situations of 28 superior and 28 inferior half-sib families, more than 92% of the half-sib families were the same, that is, of the 28 superior half-sib families, 26 were the same by both methodologies.This was also observed for the 28 lower half-sib families.Models of DBH have recently been used for classification of green tea accessions (Camellia sinenis L.) within different taxonomic groups, by using measures of leaf morphology as input variables (Pandolfi et al. 2009).
Table 3 shows the predicted gain with selection regarding direct selection in the original volume and direct selection in the volume achieved with the network.Values of gain with selection were close for both ages.A small percentage value was obtained with gain with selection.The best solution was to obtain the real gain value, which would be the gain obtained after selection and planting of selected individuals.
To complement evaluation of gain with selection, the relationship of half-sib families that would be selected with the selection in each one of the variables is described in Table 3.It can be seen that although gain percentage estimate had a small difference between the two variables, the half-sib families selected by the two variables are basically the same for both ages.This confirms that selection based in the phenotype predicted by neural network allows the no occurrence of information loss for breeding purposes, since selection can be carried out, and the genotypes phenotypically superior are the ones which are selected.
Another alternative for comparing network efficiency is using analyses of regression, which has the objective of estimating the depending variable, based on the independent variable (original volume), that is, the volume estimated without using the height for all individuals.Data used for this analysis were those used for network training.Table 4 shows that only quadratic model and cubic model were equal the value obtained by the artificial neural network (ANN) for the age of three years and six years.Other models had values lower than ANN when these models were compared by the coefficient of determination (R 2 ).This is in agreement with the observation of Haykin (2001), who showed that Neural Network may have a higher prediction than the regression models.Starret et al. (1997) reported that models of ANN had better performance (r 2 = 0.984) than the regression model (r 2 = 0.780) for N used in gram.According to Batchelor et al. (1997), models of ANN achieve better results when compared to traditional statistical methods for prediction of soybean rust.
Therefore, the use of neural networks for prediction of wood volume in eucalyptus breeding program is a useful and feasible methodology, since it is a methodology that does not require the measurement of all height trees.Thus, the errors can be reduced, since measurements can be made with more caution.Artificial neural network is more efficient than other assessed regression models, showing its prediction force.
Many supporting trees are cut down and echeloned for execution of forestry inventory, but if using artificial neural network, only one sample of trees will have its DBH and volume measured.Thus, a much more consistent measurement can be obtained, since the volumes will be better estimated by reducing the error of measurements at field level.This model can be used to help automating the process of forestry inventory, since it significantly reduces cost and time to close inventory.In addition, this approach is less susceptible to human errors during the process of forestry inventory.Another advantage is that once a sample of the trees will be used for the experiment to carry out the network training and subsequent prediction there is no risk of disregarding the differences between species, or the location where the experiment was carried out.The network will be obtained for each experiment and/or location.
It is concluded that Artificial Neural Network can help to improve the accuracy to measure the volume of eucalyptus trees, and to automate the process of forestry inventory.When the Artificial Neural Network was used to predict wood volume, it was possible to select the best genotype, such as variance analysis.Artificial Neural Network was much better than almost all the regression models.

Table 1 .
Mean squares achieved by analysis of variance in random blocks with information between and within plots, mean of the experiment, minimum and maximum values, coefficient of environmental variation (CV), heritability considering unit of mean of half-sib families, and genetic variance between families for experiments measured at three and six years of age, for original volume and volume based on neural networks ** Significant at 1% of probability by the F test Crop Breeding and Applied Biotechnology 15: 125-131, 2015

Table 2 .
Correlation of Spearman and Pearson of original volume and volume based on artificial neural network of 140 half-sib families of eucalyptus assessed at three and six years of age

Table 3 .
Selection gain (SG) predicted by direct selection for original volume and network estimated volume, with 14 selected individuals (10% of the population)

Table 4 .
Regression analysis of several models with data used for network training and efficient comparison based on the coefficient of determination (R 2 )