STUDY OF THE INFLUENCE OF WOOD PROPERTIES ON THE CHARCOAL PRODUCTION: APPLYING THE RANDOM FOREST ALGORITHM

ABSTRACT The understanding of the relationship between the properties of wood and charcoal makes it possible to improve the production of charcoal. Therefore, the random forest algorithm was used in this study to analyze the influence of eucalyptus wood properties on the quality of charcoal as well as the accuracy of the predicted values concerning the results estimated by support vector regression and multiple linear regression. Six properties of wood and six properties of charcoal obtained from the hybrid Eucalyptus grandis x Eucalyptus urophylla and from twelve clones of Corymbia torelliana x Corymbia critriodora at the age of seven were measured. In the analysis, the measure of mean decrease in node impurity (residual sum of squares) calculated with the random forest and the copula correlation was used to evaluate the relationship between properties of wood and charcoal. The random forest was compared to the support vector regression and multiple linear regression through the coefficient of determination, linear correlation between observed and predicted values, mean absolute error and root mean squared error. The accuracy of the random forest was greater than that obtained with the support vector regression and multiple linear regression, mainly in terms of the coefficient of determination and the linear correlation between observed and predicted values. The yield and quality of the charcoal produced from clones were mainly influenced by the holocellulose content, heartwood/sapwood ratio, and basic wood density. The apparent relative density of charcoal was the variable in which the random forest algorithm reached the best level of explanation of the variability as a function of the properties of wood, while the minor error was observed for the fixed carbon content.


1.INTRODUCTION
The heterogeneity of the charcoal quality is one of the main problems faced by the steel industries while controlling the reduction of iron ore process. The charcoal properties are infl uenced by the carbonization process and by inherent characteristics of the source material, such as species or genotype, chemical characteristics of the wood and age of the trees (Protásio et al. 2012;Soares et al. 2015).
Among the wood properties that infl uence the quality of charcoal, Oliveira et al. (2010) highlight the basic density, calorifi c value, chemical constitution and moisture content as the most important characteristics for the selection of genotypes that are suitable for the charcoal production. Soares et al. (2014) emphasize that the knowledge about the relationship between the properties of wood and those of charcoal still needs to be deepened. Furthermore, a better understanding of the chemical reactions that occur in wood during the carbonization process is required.
The relationship between the characteristics of wood and charcoal has been studied by estimating the linear correlation coeffi cient (Santos et al. 2011;Medeiros Neto et al. 2014;Soares et al. 2014), by simple linear regression (Brito and Barrichelo 1980;Brand et al. 2013;Santos et al. 2016) and by canonical correlation analysis (Protásio et al. 2012;Castro et al. 2013).
There is a more complex method of data analysis that associates statistical principles with computer programming and machine learning. This method allows the implementation of algorithms able to recognize patterns, learn and execute analysis in an automated way (Dantas 2017). Some methods that are already widely used, such as regression analysis and multivariate techniques, can be included in this context (Biamonte et al. 2017). The great contribution of machine learning is related to the systematic increment of programming, which allows the production of advanced and robust algorithms, like the random forest (RF).
The RF algorithm is based on decision/regression trees, which, according to Breiman (2001), provides a numerical estimate that is the average of all k trees when used for regression. In some cases, the RF proved to be more accurate than other algorithms, such as neural networks and support vector machine (Caruana et al. 2008). The performance of RF has been demonstrated in researches executed in the most diverse areas of knowledge, such as applications in remote sensing (Girolamo Neto et al. 2015), studies of soil physics (Carvalho Júnior et al. 2016), health area (Lento 2017), in the fi eld of electromechanical energy (Lopes 2017) and many others.
In this study, the random forest algorithm was used to analyze the infl uence of some wood properties on the charcoal quality from Eucalyptus and Corymbia clones, as well as the accuracy of the predicted values concerning the results estimated by support vector regression (SVR) and multiple linear regression (MLR).

2.MATERIALS AND METHODS
The study was performed using data acquired in an experimental fi eld area located in the city of Dionísio, Minas Gerais. This region has a humid subtropical climate, with the average annual temperature between 20 and 23 ºC and the average annual rainfall usually between 1100 and 1400 mm (Motta et al. 1996).
The planting of 13 eucalyptus genotypes, specifi cally one hybrid of Eucalyptus grandis x Eucalyptus urophylla and twelve of Corymbia torelliana x Corymbia citriodora, was carried out in 2008 in a land of Red-Yellow Latosol, with a plant spacing of 3.0 x 2.5 m. The plantings were seven years old when the trees were felled to collect the samples. These samples were obtained for the analysis of wood properties (heartwood/sapwood ratio, basic wood density, higher calorifi c value, total lignin content, holocellulose content and extractive content) and charcoal properties (higher calorifi c value, gravimetric yield in carbonization, friability, apparent relative density, content of volatile materials and fi xed carbon content).

Anatomical and physical properties of wood
The heartwood/sapwood ratio was measured according to Castro et al. (2013). Basic density was measured according to the standard ABNT NBR 11941 (2003) and the higher calorifi c value was determined according to the ABNT NBR 8633 (1984).

Analysis of the chemical composition of wood
The total extractive content was determined according to the TAPPI 204 cm-97 standard (TAPPI 1997) and the total lignin content was quantifi ed according to Gomide and Demuner (1986) and Goldshimid (1971). The holocellulose content was calculated by deducting the percentages of total lignin and extractives from 100%.

Gravimetric yield and charcoal properties
The gravimetric yield of carbonization was determined by the relationship between the charcoal mass and the dry wood mass. Apparent relative density of the charcoal was determined by using the hydrostatic method, according to Vital (1984). Friability was obtained according to the methodology proposed by Oliveira et al. (1982). The contents of volatile materials and ashes were measured according to the NBR 8112 (ABNT 1986). Fixed carbon content was calculated by deducting the contents of volatile materials and ashes from 100% and the higher calorifi c value was measured according to the NBR 8633 (ABNT 1984).

Parameterized algorithms
All analysis were performed using the statistical packages of the software R. The RF regression trees were built through the randomForest function, which is part of the package with the same name (R Core Team 2018). For the three basic parameters of the function (Breiman 2002) -number of trees, variables randomly selected by node and number of terminal nodes -there were 500 trees, four variables per node and fi ve terminal nodes established, respectively. In the SVR training, the radial basis function kernel (RBF kernel) was used by means of the train function of the caret package (R Core Team 2018). The train function was also applied to adjust the MLR.
The wood properties were classifi ed according to the importance of its infl uence on the charcoal features using the measure of mean decrease in node impurity (MDNI) of the trees in the RF algorithm, calculated by the residual sum of squares. Furthermore, the level of dependence between the properties of wood and charcoal was measured by the Spearman's rank correlation coeffi cient (Spearman's ρ) calculated with an elliptical copula, using the pobs function (R Core Team 2018) in order to convert data to an uniform distribution. This coeffi cient is equivalent to the Pearson's correlation coeffi cient (Pearson's r), with the additional feature of also measuring non-linear correlations (Ding and Li 2013). The training of trees was only repeated in function of the most important variables, aiming the measurement of the infl uence of these predictor values on the yield and quality of charcoal through the coeffi cient of determination (R 2 ).
To evaluate the eff ectiveness of RF in relation to SVR and MLR, data was randomly divided into 75% for training and 25% for validation. The methodologies were compared based on the performance outcomes on the data of validation, following the criteria used in the papers of Hallak and Pereira Filho (2011), Aitkenhead andCoull (2016), Carvalho Junior et al. (2016) and Malone et al. (2016), in which the coeffi cient of determination (R 2 ), the Pearson's correlation coeffi cient between observed and predicted values (ryŷ), the mean absolute error (MAE) and the root mean squared error (RMSE) were calculated. To calculate the accuracy criteria, the postResample function of the caret package was used (R Core Team 2018).

Wood properties vs yield and quality of charcoal
According to the MDNI measurements, it was noticed that the holocellulose content of wood was the variable that mostly infl uenced the yield of carbonization, volatile materials content and friability of charcoal. Heartwood/sapwood ratio was the most important variable for the fi xed carbon content of charcoal. Inferring about the apparent relative density and higher calorifi c value of the charcoal, basic density of wood was the property of greatest infl uence. The measure of the infl uence of wood variables on charcoal is shown by the MDNI graph, in whom, the more to the right is the point referring to the variable, the more explanatory the variable is in the algorithm (Fig. 1). Table 1 shows that, with the exception of friability as a function of holocellulose, and apparent relative density of charcoal as a function of basic wood density, all the other relationships were inversely proportional. By analyzing the coeffi cients of determination (Table1), it is possible to ensure that most of the variability of the charcoal properties is explained by the most important wood variables in terms of RF modeling.

Performance of the models
The performance of the RF algorithm regarding the measures of the coeffi cient of determination (R 2 ), linear correlation between observed and predicted values (ryŷ), mean absolute error (MAE) and root mean squared error (RMSE), was superior to the other algorithms tested for all the charcoal variables (Tables  2 and 3). It is important to point out that the results of performance that are presented in this paper refer to the data intended for validation, which, in this study, includes only 12 observations, therefore featuring a small sample. The values of R 2 and ryŷ regarding to the RF algorithm were relatively high, with most of them presenting values above or close to 90% when including all the independent variables, and always remaining above 60% in the estimates containing only the most infl uential independent variable (Tables  2 and 3). In both estimation cases, the r values for the RF were signifi cant, which did not occur in the same magnitude for the SVR and MLR (Table 3).
The best results in terms of R 2 and ryŷ were obtained while estimating the apparent relative density of charcoal. On the other hand, considering the measures of MAE and RMSE (Table 4), it is noticed that the accuracy was higher while estimating fi xed carbon. The results of EMA and REQM also suggest that the diff erences in RF accuracy for SVR and MLR are narrow, mainly in the prediction of fi xed carbon and the yield of carbonization. The diff erent interpretations that can be extracted from R 2 , ryŷ, MAE and RMSE may imply an unsureness about the more appropriate measure for the comparison of the evaluated methods. In this case, it is important to emphasize that the MAE and RMSE measured the diff erence between the observed and predicted values directly. Consequently, these error measures become more reliable to compare and evaluate the diff erent prediction methods.

4.DISCUSSION
Unlike the dependence considered high (R 2 = 61.65%) that was observed for the relationship between the carbonization yield and wood holocellulose content, Santos (2008) obtained a low linear correlation (r = -0.15) between these two variables. It is consistent to what was observed in the work of Soares et al. (2014), in which the carbonization yield granted a greater dependence on the higher calorifi c value of the wood. In practice, the carbonization yield is usually higher when wood has a bigger lignin content, since this macromolecule is Table 1 -Spearman's rank correlation coeffi cient (ρ) and coeffi cient of determination (R 2 ) of the random forest algorithm (RF) implemented only with the most infl uential properties for the yield and quality of charcoal produced with eucalyptus clones. Tabela 1 -Dependência pelo coefi ciente de Spearman (ρ) e coefi ciente de determinação (R 2 ) do algoritmo random forest implementado apenas com as propriedades mais infl uentes no rendimento e qualidade do carvão vegetal produzido com clones de eucalipto. more stable to the thermal degradation than the other existent chemical compounds (Trugilho et al. 2001;Pereira et al. 2013).
The relationship between holocellulose and volatile material content in terms of linear correlation was almost null in the study made by Soares et al. (2014), in a way that the main infl uence reported by the authors on the volatile materials of charcoal was the carbon-hydrogen ratio of wood. According to Oliveira et al. (2010), the volatile content also depends on the carbonization temperature and heating rate. Brito and Barrichelo (1977) noticed that more lignifi ed eucalyptus wood leads to a charcoal with a Table 3 -Coeffi cient of linear correlation between observed and predicted values (r yŷ ) for the random forest (RF), support vector regression (SVR) and multiple linear regression (MLR) considering six wood predictor variables (1 st estimate) and only the most important variable (2 nd estimate), for validation data. Tabela 3 -Coefi ciente de correlação linear entre valores observados e preditos (r yŷ ) pelo random forest (RF), support vector regression (SVR) e regressão linear múltipla (RLM) considerando seis variáveis preditoras da madeira (1ª estimação) e apenas a variável mais importante (2ª estimação), utilizando dados para validação.  Table 4 -Mean absolute error (MAE) and root mean squared error (RMSE) for the random forest (RF), support vector regression (SVR) and multiple linear regression (MLR) using six wood predictor variables (1 st estimate) and only the most important variable (2 nd estimate), considering validation data. Tabela 4 -Erro médio absoluto (EMA) e raiz quadrada do erro quadrático médio (RMSE) do random forest (RF), support vector regression (SVR) e regressão linear múltipla (RLM) utilizando as seis variáveis preditoras da madeira (1ª estimação) e apenas a mais importante (2ª estimação), considerando dados de validação. higher fi xed carbon content, which, according to the authors, is a direct consequence of the composition of lignin, as this polymer contains approximately 65% of elemental carbon. Even though the results indicate an inverse relationship between fi xed carbon and heartwood/sapwood ratio, wood that has a greater heartwood/sapwood ratio can increase the fi xed carbon content in the charcoal, since heartwood tends to have a lignin content that is greater than -or even equal -to that observed on sapwood (Klitzke et al. 2008;Costa et al. 2017;Fonte et al. 2017). It is also important to emphasize that the production process is the most important factor that interferes with the fi xed carbon content of charcoal (Róz et al. 2015).
Previous studies that evaluated the friability of charcoal produced from diff erent species, correlate the highest fi nes generation mainly to the diameter of the logs, the carbonization time and the wood moisture (Coutinho and Ferraz 1988;Silva 1988;Pinheiro 2013). Coutinho and Ferraz (1988) state that the generation of fi nes is caused by the formation of internal stresses during the moisture loss process. These stresses are directly infl uenced by the heartwood area and by the variation of density between the pith and the bark of the wood.
The variation of the apparent density of the charcoal as a function of basic density of the wood is well known, as demonstrated by Brito and Barrichelo (1980) in a simple linear equation with R 2 of 97% for this relationship. The higher is the density of the wood, the greater is the density and mechanical resistance of the charcoal. These characteristics, with the addition of the granulometry, are the most important ones for the operation of the blast furnace in the steel industry (Brito 1993;Pereira 2012).
Although the higher calorifi c value of the charcoal as a function of the basic density of wood presented a high R 2 value, there is no research that demonstrates any direct relationship between these two variables. According to Couto (2014), a higher calorifi c value of charcoal is dependent on the elemental chemical composition of the material. An increase in the higher calorifi c value can be obtained by raising the carbonization temperature, which also results in a rise of the fi xed carbon content caused by the elimination of volatile materials (Figueiredo et al. 2018).
Unlike what was observed in this present study, Montaño (2016) obtained better results in the prediction of the volume of Pinus taeda, and in the projection of the biomass content and the height of Acacia mearnsii, when using the SVR compared to the RF. In the study of Carvalho Junior et al. (2016), better results were obtained with the use of MLR than with RF, in order to estimate the density of the soil in function of the physicochemical properties of the samples. Rodríguez-Lado et al. (2015), on the other hand, obtained better results by applying RF to estimate soil density as a function of organic matter and texture than by applying artifi cial neural networks and MLR to it. This divergence of outcomes and lack of unanimity on which technique or algorithm is more effi cient considering the most diverse areas of knowledge, highlights the importance of testing diff erent methodologies and parameter settings, so that it turns possible to defi ne a set of procedures that is eff ectively more appropriate for each research fi eld.

5.CONCLUSIONS
Using the random forest algorithm, the yield and quality of the charcoal produced from clones of eucalyptus were mainly infl uenced by the holocellulose content, heartwood/sapwood ratio and basic wood density.
The apparent relative density of the charcoal was the variable that the random forest algorithm reached the best level of explanation of the variability as a function of the properties of the wood, in addition to being the variable with the greatest reduction in the error of the predictions, when compared to the other two tested techniques. The fi xed carbon content observed the minor error provided by the algorithm predictions.
The accuracy of the random forest algorithm was greater than that obtained with the support vector regression and multiple linear regression. Therefore, the estimation of the yield and quality properties of charcoal as a function of wood properties through machine learning using the random forest is possible to be executed. Notwithstanding, considering the error measures, the use of support vector regression and multiple linear regression also leads to accurate predictions for most of the dependent variables.

AUTHOR CONTRIBUTIONS
Kaléo D. Pereira: data analyze and text written, Antônio P. S. Carneiro: research supervision and text review, Gerson R. Santos: conception and data analyze, Angélica C. O. Carneiro: technical review, Hélio G. Leite: technical review and Felipe P. Borges: text review and translation.

ACKNOWLEDGMENTS
To the company ArcelorMittal BioFlorestas and the Laboratory of Wood Panels and Energy -LAPEM of the Federal University of Viçosa -UFV for providing the wood samples and carrying out the laboratory analyzes, respectively. To the Coordination for the Improvement of Higher Education Personnel -CAPES for granting a postgraduate scholarship -Financing Code 001.