Alternatives to Growth and Yield Prognosis for Pinus caribaea var. caribaea Barrett & Golfari

The objective of this study was to obtain regression equations and artificial neural networks (ANNs) for prediction and prognosis of the yield of Pinus caribaea var. caribaea Barrett & Golfari. The data used for modeling comes from measuring the variables diameter at breast height (DBH) and total height (Ht) in 550 temporary plots and 14 circular permanent plots with 500 m2 in Pinus caribaea var. caribaea plantations, aged between 3 and 41 years old. In growth prediction, the results indicated Schumacher model as the best fit to the data. On prognosis, the modified Buckman system was better than Clutter’s. ANNs presented a similar performance to the Buckman model in volume prognosis, however these were superior for basal area prognosis.


INTRODUCTION AND OBJECTIVES
Mathematical models are not new in the forest area and are one of the most important approaches in the study of forest dynamics. In these studies, present estimates (predictions) and future estimates (prognosis) made with modeling techniques, both at the tree and stand level, are essential steps to enable forestry activity planning (Prodan et al., 1997).
Mathematical modeling refers to the development or adjustment of mathematical expressions that describe the behavior of a variable of interest. Regression analysis, a statistical technique whose name is attributed to the British anthropologist Francis Galton (Draper & Smith, 1998), is the most used technique in empirical modeling research, especially when the objective is to describe an existing but hidden relation between a set of independent variables and a dependent variable (Pardoe, 2012).
Equations, the main results of the regression analysis, help forest researchers and managers to forecast future forest yields to select better management options, appropriate silviculture alternatives or to plan forest harvest frequencies and sequences (Burkhart & Tomé, 2012).
When discussing the difference between prediction and prognosis models, it is worth noting that prognoses are performed by regression models in the form of equation systems that estimate the parameters of the function for the projection of production for future ages (Castro et al., 2013) and prediction models can be defined as functions that simply describe the change in the size of an individual (tree) or population (population) over time (age) (Burkhart & Tomé, 2012).
From the perspective of the input variable components for the models, Binoti et al. (2015) assert that prediction is carried out by models that have age as an independent variable, while prognosis is performed by models in which future production is projected as a function of current production among other variables. The errors associated with these prognosis models grow over time, and considering the long horizons of the planning of forest productive processes, making precise forecasts has become the main challenge of forest yield models.
In the last decades, the need for more accurate estimates has led to techniques such as artificial neural networks (ANNs) becoming popular for forest measurement. Due to their effectiveness in understanding complex systems, these modeling techniques are used as alternatives to the adjustment of traditional nonlinear regression models (Özçelik et al., 2017). The ANNs can be defined as mathematical models that have the functioning of the human brain with its biological neural networks as a metaphor (Valença, 2010).
Forest plantation growth and yield modeling using regression analysis were approached in numerous researches, such as the outstanding studies by Schumacher (1939), Buckman (1962) and Clutter (1963). In applied regression analysis, we highlight authors such as Draper & Smith (1998) and Pardoe (2012) whose studies were complementary to specific forest literature. Studies of Ashraf et al. (2013), Castro et al. (2013), Özçelik et al. (2014), are among studies that applied the techniques of ANNs for the same purpose.
Given the above, this study sought to fit regression models and train ANNs for the prediction and prognosis of Pinus caribaea var. caribaea growth and yield at Macurije forest company, Pinar del Río, Cuba.

Geographical location of the study area
This study was carried out in plantations of Pinus caribaea var. caribaea of a company called Macurije located between the coordinates 22º06' to 22º42' latitude North and 83º48' to 84º23' longitude west, in the most western region of the province of Pinar del Río, Cuba ( Figure 1).

Data sources and analysis of sample sufficiency
The database used consisted of 550 temporary plots and 14 circular permanent plots of 500 m² in plantations of Pinus caribaea var. caribaea with ages ranging from 3 to 41 years old. Temporary plots were collected following a random sampling throughout the company and the permanent plots established and monitored until 2006, distributed in the company's two silvicultural units (Guane and Mantua), and six

3/14
Alternatives to Growth and Yield ... Floresta e Ambiente 2019; 26(4): e20170381 consecutive measurements were made. In the plots, variables age (A), diameter at breast height (DBH) and total height (Ht) were measured, and the yields represented by the variables basal area (G) and volume (V) were calculated.
Sample sufficiency analysis was performed using sampling error, based on the random sampling procedure in an infinite population, with an acceptable error of 10% and a 95% probability level.

Growth and yield models fitted for plantations of Pinus caribaea var. caribaea
The selected growth and yield models (Table 1) were fitted for complete settlement and the one with the best data adherence was adjusted by site class.

Artificial neural networks (ANNs) training for yield prediction and prognosis
There were 100 ANNs of multilayer perceptron (MLP) and radial basis function (RBF) type trained for both growth prediction and yield prognosis and the two-best retained for analysis. The variables and training algorithm used, as well as the activation functions tested, are found in Table 2.
The dataset was divided into three parts: 50% for training, 25% for test and 25% for cross-validation. The variables were normalized by linear transformation at

Parameters estimation and models (regression and ANNs) selection criteria
The adjustments of the regression models as well as the ANNs training were performed with the application software Statistica 8.0 and SPSS 20.0. The linear models were fitted using the ordinary least squares method (OLS) and nonlinear models with the Levenberg-Marquardt, Gauss-Newton, or Newton-Raphson iterative methods. The prognosis models were fitted with the two-stage least squares method (2SLS) since they were exactlyidentified simultaneous equation systems.
The quality of the adjustments was evaluated using the following criteria: adjusted coefficient of determination (R²aj); standard error of estimation (Syx); root mean square error (RMSE) and residuals distribution analyses to verify possible estimation trends in the equations obtained. The assumptions of normality, homoscedasticity and serial autocorrelation of the residuals were also verified by the Kolmogorov-Smirnov, White and Durbin-Watson tests, respectively.
In cases of violation of the first two assumptions, logarithmic transformation was applied. For models that underwent such a transformation, it was necessary to correct the logarithmic discrepancy with the Meyer correction factor as well as recalculate the residual standard error. The problem of the serial autocorrelation of residuals was addressed by the Cochrane-Orcutt method (Cochrane & Orcutt, 1949).
The validation of regression equations and trained ANNs was performed by comparing their estimates with the observed values. The univariate comparisons were performed using the statistical procedure proposed by

5/14
Alternatives to Growth and Yield ... Floresta e Ambiente 2019; 26(4): e20170381 Leite & Oliveira (2002), testing the hypothesis H 0 : the observed values are equal to the values estimated by the regression equations or the ANNs. This procedure combines Graybill's F (H 0 ) test, the t-test for mean error ( ) and the linear correlation (r) between the observed and estimated values.
In order to validate the models (regression equations and ANNs) adjusted for the simultaneous prognosis of production variables (basal area and volume), multivariate comparisons between the observed values and those estimated by the models were performed through the Hotelling T² test, using the procedure proposed by Balci and Sargent (1982).

Estimates of the parameters of growth and yield models
The sampling error of 2.19%, corresponding to a pilot sample of 550 plots, was less than the allowable error of 10%, which indicated that this was enough to make the volume estimates with the required precision. Table 3 shows the estimates of the parameters of each model. All equations resulting from the adjustments indicate rotation ages between 30 and 35 years for the species in the company. The consonance of the rotation ages with those found by Barrero et al. (2011) indicates consistency of the parameter estimates obtained. These results and the high coefficients of determination and smaller standard error of the estimates (Table 3)  The Kolmogorov-Smirnov tests indicated that only the residuals of the Schumacher, Logistic and Silva-Bailey models followed a normal distribution (p-value > 0.05), a necessary condition for the results of the t and F parametric tests used to test the significance of the models and their respective parameters to be reliable.
The results of the Durbin-Watson test indicated that only the Schumacher model showed uncorrelated residuals. The Chapman-Richards, Silva-Bailey, and Logistic models presented negative serial auto-correlation and Korf 's a positive auto-correlation.
The White test results (p-value > 0.05), confirmed by the residuals distributions (Figure 2), indicated that only the Schumacher and Korf models met the homoscedasticity assumption. The periodic or sinusoidal distribution of the logistic model residuals indicates its inadequacy for the data. This latter model and Chapman-Richards's model showed a tendency to overestimate smaller volumes.
Site index inclusion in the Schumacher (1939) model for volume prediction by productive capacity generated inconsistent results, opting then for its adjustment by site class. These adjustments allowed for relative control of the site variation source, with good adjustments despite the reduction of sample size per site (Table 4).
The assumption of normality was only observed in the residuals of the last three sites (p-value > 0.05), so logarithmic transformation was performed, which was effective in solving the problem. The results of the Durbin-Watson test indicated the existence of positive serial autocorrelation in the residuals of all models. The application of the Cochrane & Orcutt (1949) procedure has eliminated the problem from the equations that presented good precision and biological consistency ( Table 4). The results of the White test (p-value > 0.05) indicated compliance with the assumption of homoscedasticity in all equations. The Schumacher equation indicated a yield of 375.73 m³/ha, corresponding to an MAI (mean annual increment) of 11.05 m³/ha/year. In the estimates obtained from Schumacher equations by site class (Table 4), it is possible to observe that in the case of biological consistency, a reduction of the opposite of the coefficient β 1 (rotation age) with increase in site quality and tendency to increase productivity in the 7/14 Alternatives to Growth and Yield ... Floresta e Ambiente 2019; 26(4): e20170381 same direction occurs. In this sense, MAIs of 6.37 m³/ ha/year, 10.96 m³/ha/year, 12.01 m³/ha/year, 12.65 m³/ ha/year and 13.21 m³/ha/year were recorded for sites V, IV, III, II and I, respectively.
With the exception of site V, whose productivity was low and similar to that reported by Aldana et al. (2006) for the species in the company's planning (6.50 m³/ha/year), and site I, whose productivity was above 13 m³/ha/year, the MAIs are consonant with the results of Barrero et al. (2011), who found MAIs between 10 m³/ha/year and 12 m³/ha/year. TRAs indicated by the obtained equations (Tables 3 and 4) also correspond to the TRAs between 30 and 35 years found by these authors.

Equations for growth and yield prognosis in Pinus caribaea var. caribaea plantations
In the Clutter equations (Table 5), the negative signal of the parameter β 1 estimate indicates the consistency of the volume estimates. On the other hand, the same negative signal in the estimate of parameter α 1 (α 1 = −0.091), in the basal area projection equation,  indicates that the effect of the site index (S) on the basal area was inconsistent (Table 5). In this case, Campos & Leite (2017) recommend that the S in the term (1-A 1 /A 2 ) S be replaced by LnG 1 , (LnG 1 ) 2 or Hd 1 .
The aforementioned substitution did not generate any statistical contribution, so we opted to eliminate this term as recommended by the authors mentioned above and adopted by Dias et al. (2005). The basal area prognosis equation was then reduced in the form presented in Equation 5.
The minimal changes between the R² values (from 96.20% to 95.55%) and RMSE (from 0.97% to 1.06%) of both forms of the model indicated that the exclusion of the term did not lead to statistical loss for the initial equation. Thus, the residual distribution of this reduced equation (Figure 3) presented the same problems of the 8/14 Floresta e Ambiente 2019; 26(4): e20170381 Guera OGM, Silva JAA, Ferreira RLC, Lazo DAA, Barrero Medel H initial equation: an overestimation of the lower basal areas and an underestimation of the larger ones, coinciding with the trends observed by Castro et al. (2013).
Regarding the Buckman model modified by Silva (2006) (BMS), the estimates of the parameters related to the variables site index (S 1 ) and basal area (G 1 ) were positive and those related to the reverse of age (1/A 2 ) were negative. This indicates biological consistency of the estimates since the signs of these coefficients assure that both basal area and volume increase when there is improvement in productive capacity (site index) and/ or increase in age (Figure 4).
For comparisons, BMS equations were higher than those of Clutter (1963). Such superiority is evident in the volume projection equations by criteria values such as R² (98.97 for Buckman versus 97.45 for Clutter), RMSE (0.08 against 0.14), and a non-biased residual distribution for the Buckman model (Figure 3). Regarding the basal area projection, although the Clutter (1963) model presented higher statistical indicators (Table 5), the tendency to overestimate the smaller basal areas and to underestimate the larger ones is evident as previously pointed out. This tendency in basal area estimates had a marked influence on the volume prognosis whose accuracy was lower in this model. Concerning the BMS system, the prognoses obtained with the equation of increments in basal areas were not biased (Figure 3).
Other aspects in favor of the Buckman system were the assumptions. The results of the Kolmogorov-Smirnov test indicated that the Buckman system equations satisfied the normality assumption (p-value > 0.05) and consequently the results of F and t-tests of this model are reliable (Table 5). This is not the case with Clutter's equations, in which this assumption was not met. Regarding the Durbin-Watson test, the results indicate that only the residuals of the Buckman system are relatively free of autocorrelation. Except for the Clutter volume equation, all other equations satisfied the assumption of homoscedasticity, according to the White test results (p-value > 0.05).
Simulations of prognoses with Buckman system equations allowed to check their biological realism and the consistency of the estimates obtained ( Figure 4). They were observed in these prognoses for rotation ages between 30 and 35 years; yields varying between V 2 = 160.439 m³/ha (G 2 = 22.46 m²/ha) for site V and V 2 = 356.280 m³/ha (G 2 = 42.81 m²/ha) for site I (Figure 4), thus indicating a proportionality between production, site, and age. These results are consistent with those of Francis (1992) who reported basal areas between 20 and 60 m²/ha for the species.
3.3. Artificial neural networks for yield prognosis for P. caribaea var. caribaea The results of ANNs training indicated that the neural networks of Multilayer Perceptron (MLP) type with the number of neurons in the hidden layer varying between 5 and 11 were the most efficient in both prediction and prognosis of Pinus caribaea var. caribaea production in Macurije Forest Company. With respect to volume prediction, inclusion of categorical variables allowed to obtain ANN_P1 with precise and consistent estimates (Table 6 and Figure 5) characterized by yields proportional to site qualities. The technical rotational ages generated by this ANN ( Figure 5) were similar to those found with the Schumacher model fitted by site class (Table 4).
The ANNs also provided satisfactory results in prognoses of basal area and volume. Inclusion of dummy variables also improved the generalization capacity of ANNs both in basal area and volume prognosis (Table 6). Leite & Oliveira (2002) test results (Table 7) indicated that there is no significant difference between the volumes observed and those estimated by the two approaches (ANNs and regression equations). This satisfactory result, evidenced by the excellent values of the ANNs evaluation criteria (Table 6) and the regression models (Table 5)  distributions, indicated similar performance between both approaches in volume prognosis.
Regarding the basal area prognosis, the results of applying the statistical procedure by Leite & Oliveira (2002) indicated the existence of discrepancy only between the basal areas observed and those estimated by the Buckman equation (Table 7).
In the multivariate comparison, based on both basal area and volume prognoses, the non-significance of the Hotelling's T 2 test (T 2 = 0.52; F = 0.26 ns ) between ANN estimates and observed values indicates that there is no difference between them. However, the values estimated by the BMS system differed significantly from those observed (T² = 32.59, F = 16.17*). This difference is likely related to the low performance of this system in basal area prognosis, according to the univariate comparisons.
These results are indicative of the superiority of ANNs in production prognosis and are in agreement  with Porras (2007) and Ashraf et al. (2013) whose results also pointed to the superiority of ANNs. This superiority can be attributed to exclusive characteristics of ANNs such as fault tolerance, the parallelism of its structure and its greater parsimony in comparison to traditional regression models.

CONCLUSIONS
The best growth prediction equation for Pinus caribaea var. caribaea plantations was the one obtained through fitting of the Schumacher model.
The flexibility of ANNs allowed for the inclusion of categorical variables (site index and FPBU) that enabled more accurate predictions, without losing the biological realism of the models and consequently the consistency of the estimates. Table 7. Results obtained by applying the procedure proposed by Leite & Oliveira (2002). In production prognosis, the Buckman model modified by Silva et al. (2006) was higher than the Clutter (1963) model. In volume prognosis, ANNs and Buckman model modified by Silva et al. (2006) performed similarly. This was not the case in basal area prognosis during which ANNs generated more accurate estimates than those of Buckman's equation.