PREDICTION SPATIAL PATTERNS OF WINDTHROW PHENOMENON IN DECIDUOUS TEMPERATE FORESTS USING LOGISTIC REGRESSION AND RANDOM FOREST

Forest management needs to evaluate various hazards where may cause economic or other losses to forest owners. The aim of this study is to prepare windthrow hazard maps based on logistic regression and random forest models in Nowshahr Forests, Mazandaran Province, Iran. First of all, 200 windthrow locations were identified from extensive field surveys and some reports. Out these, 140 (70%) locations were randomly selected as training data and the remaining 60 (30%) cases were used for the validation goals. In the next step, 10 predictive variables such as slope degree, slope aspect, altitude, Topographic Position Index (TPI), Topographic Wetness Index (TWI), distance to roads and skid trails, wind effect, soil texture, forest type and stand density were extracted from the spatial database. Subsequently, windthrow hazard maps were produced using logistic regression and RF models, and the results were plotted in ArcGIS. Finally, the area under the curves (AUC) and kappa coefficient were made for performance purposes. The validation of results presented that the area under the curve and kappa have a more accuracy for the random forest (97.5%, and 95%, respectively) than logistic regression (96.667%, and 93.333%, respectively) model. Therefore, this technique has more potentiality to be applied in the evaluation of windthrow phenomenon in forest ecosystems. Additionally, both models indicate that the spatial distribution of windthrow incidence likelihood is highly variable in this region. In general, the mentioned findings can be applied for management of future windthrow in favor of economic benefits and environmental preservation. PREDIÇÃO DOS PADRÕES ESPACIAIS DO FENÔMENO DE QUEDA DE ÁRVORES POR VENTO EM FLORESTAS TEMPERADAS DECIDUAS USANDO REGRESSÃO LOGÍSTICA E RANDOM FOREST RESUMO: O manejo florestal precisa avaliar diversos riscos que podem causar prejuízos económicos ou outras perdas para os proprietários florestais. O objetivo deste estudo é elaborar mapas de risco de queda de árvores por vendo com base em regressão logística e random forest (RF) nas florestas de Nowshahr, província de Mazandaran, no Irã. Primeiramente, 200 locais com queda de árvores por vento foram identificadas por levantamentos de campo e relatórios. Destes, 140 (70%) foram selecionados aleatoriamente como dados de treinamento e os 60 restantes (30%) foram usados para validação. Na etapa seguinte, 10 variáveis preditivas, sendo, inclinação, face de exposição, altitude, índice de posição topográfica (TPI), índice de umidade topográfico (TWI), distância de estradas e trilhas, efeito do vento, textura do solo, tipo de floresta e densidade do talhão, foram extraídos do banco de dados espacial. Posteriormente, mapas de risco de queda de árvores por vento foram elaborados usando regressão logística e modelos de RF, e os resultados foram plotados em ArcGIS. Finalmente, a área sob as curvas (AUC) e coeficiente kappa foram computados para fins de avaliação de desempenho. A validação dos resultados mostrou que a área sob a curva e índice kappa apontaram para maior precisão para RF (97,5% e 95,0%, respectivamente) do que a regressão logística (96,7% e 93,3%, respectivamente). Portanto, esta técnica tem mais potencialidade de ser aplicado na avaliação do fenômeno de queda de árvores por vento em ecossistemas florestais. Além disso, ambos os modelos indicaram que a distribuição espacial da probabilidade de incidência de queda de árvores por vento é altamente variável na região. Em geral, as conclusões deste artigo podem ser aplicadas para a gestão da queda de árvores por vendo em favor da preservação ambiental e benefícios económicos.


INTRODUCTION
Strong wind damage is one of the most disturbance interacting with forest ecosystem processes in both managed and virgin forests (COUTAND, 2017;ANONYMOUS, 2016), which leads to windthrow.Windthrow referred to breakage and uprooting of trees by the wind, as a natural phenomenon in forests and results from the interaction among tree, stand, physiographic, climate and soil components (KOOCH et al., 2014).According to the census conducted in forest, range, and watershed management organization in Iran, it was found that timber volume more than 25% of the annual allowable cut was damaged by the wind in 2016.It means that in addition to disruption in cutting and transporting planning, revenues from non-salvaged timber decreased seriously (SCHINDLER et al., 2012).
In order to decrease potential risks from wind regime and empower forest management against this disturbance, it is of major importance to understand the key drivers of wind disturbance.In recent years our quantitative understanding of disturbance processes has been increased regarding study improvements (SEIDL et al., 2011;THOM et al., 2013).These researchers prepared information on the historical range of variability of forest ecosystems to manage wind disturbance (KEANE et al., 2009).
Although it is difficult to control windthrow, it is possible to predict different hazard levels for minimizing wind hazards and avoid potential windthrow (BLENNOW; SALLNÄS, 2004).Mechanistic and empirical are two general approaches for predicting windthrow risk.The first technique is based on the calculation of the wind speed at a given location with single species stands or structurally uniform (LOCATELLI et al., 2017).In empirical modeling the windthrow risk mapping develops by remote sensing and geographic information system with wide range of predictor variables, depending on the specific characteristics of wind events in different forest sites (HALE et al., 2015).
The purpose of current research is to produce windthrow risk maps using binary logistic regression and random forest models (OLIVEIRA et al., 2012) and compare their performance in Nowshahr Forests, Mazandaran Province, Iran.The main difference between this study and the approaches described in the aforementioned publications is that a data-driven Random Forest (RF) model is prepared and the result is compared with Logistic Regression (LR) model in the study area.

MATERIAL AND METHODS
The study area is located in the western part of Mazandaran Province, in the north of Iran, between latitudes 36° 27′ 30″ to 36° 31′ 30″N, and longitudes 51° 30′ 00″ to 51° 33′ 00″E (Figure 1).It covers an area about 1,447 ha.The elevation of the study area ranges of 1,017 to 2,000 meters above sea level.The climate of Nowshahr is temperate and mountainous type at heights, while in plains, temperate and semi-humid climate prevails.The mean annual precipitation within the study area varies from 900 to 1,100 mm.Based on Iranian meteorological organization, maximum and minimum of temperature was reported as 38 degrees above zero and 7 degrees below zero, respectively.The study area comprised of two type textures soil consists of clayey and silty-loamy.The dominant tree species are beech (Fagus orientalis Lipsky), hornbeam (Carpinus betulus L.), maple (Acer velutinum Boiss), and Alder (Alnus subcordata C.A.Mey).

DATA COLLECTION
Generally, data collection and construction of a database of predictive factors in any research are the most important sections of the process (ERCANOGLU; GOKCEOGLU, 2002).At fi rst, windthrow occurrences and locations were gained from extensive fi eld surveys and satellite images.Out of 200 forest windthrow locations, 70% were used in the training and the remaining; 30% were used for validation (Figure 1).It should be noticed that 200 forest locations as without windthrow places were determined randomly.In order to windthrow hazard zonation of the study area, 10 predictive factors were considered.These factors are slope degree, slope aspect, altitude, Topographic Position Index (TPI), Topographic Wetness Index (TWI), distance to roads and skid trails, wind effect, soil texture forest density, and forest type.
One of the most important factors in any windthrow hazard rating system is physiography data.In the literature, the impacts of slope degree, slope aspect, and elevation in wind behavior have been widely reported (CONSTANTINE et al., 2012;VACCHIANO et al., 2016).In this study, a digital elevation model (DEM) was prepared by digitization of contours at a 10 m interval and survey base points.The DEM map has a grid size of 10m with 365 columns and 675 rows.Using of the mentioned DEM, slope degree, slope aspect, elevation, Topographic Position Index (TPI), Topographic Wetness Index (TWI) were produced.Slope map of the study area is derived from the DEM.Slope aspect has been categorized into nine classes: (1) Flat, (2) North, (3) Northeast, (4) East, (5) Southeast, (6) South, (7) Southwest, (8) West, and (9) Northwest.Additionally, the elevation map for the study area was produced from the DEM.
The another factor is Topographic Position Index (TPI) which refl ects the difference in elevation between a focal cell and all cells in the neighborhood, which can make a simple and useful means to classify the landscape into morphological classes.Another topographic factor is TWI which is defi ned based on Equation 1 (MOORE et al., 1991), where α is the cumulative up slope area draining through a point and tan β Is the slope angle at the point.
Using topographic database in the study area, the distance to roads and skid trails were calculated.The roads and skid trails buffers were calculated at 50 m intervals.Additionally, using the meteorological database, the wind effect was calculated.In the present study, the wind effect was produced in SAGA-GIS.There are two types of soils such as Clayey and Silty-loamy in the study area.This layer was produced by digitizing the soil texture map of Mazandaran Province (1:100,000-scale) obtained from the Agriculture Department, Iran.The forest density (number of trees in ha) maps were prepared as follows <150 (low), 150-200 (moderate), and >200 (high).Additionally, forest type map was prepared as pure beech and mixed beech stands.Forest stand maps were provided in vector form by forests, range, and watershed management organization, Iran (1:10,000-scale).
For the application of logistic regression and random forest models, all the mentioned forest windthrow inducing factors were converted to a raster grid with 10m×10m pixel size.

LOGISTIC REGRESSION
Logistic regression is very popular and is often used for modeling in natural science (STEPHENSON et al., 2006).The main purpose of this model is to fi nd the best equation and express the relationship between a response variable and multiple predictive variables.In the current situation, the response variable is a binary variable representing the presence or absence of windthrow.The logistic model can be described in its simplest form as following Equation 2, where, P is the probability of an event (windthrow) occurrence, which varies from 0 to 1 on an S-shaped curve; Z (Equation 3) is defi ned by the following equation (logistic regression model), and its value varies from to +, β 0 represents the intercept of the model, 1; 2; . ..; n the partial regression coeffi cients, X 1 , X 2 , ..., X n represent the independent variables.
A full model was fi tted to the data using the likelihood function.Afterward, stepwise deletion or insertion of predictors was prepared.A backward/ forward stepwise model selection method was applied, starting with the full model and alternately omitting and re-introducing one model component at each step (PETERS et al., 2007).Selection stopped when no predictive variable deletion or insertion caused a lower Akaike Information Criterion value, resulting in the model with the lowest AIC value (PETERS et al., 2007). [2] [3]

RANDOM FOREST
Random Forest is a tree-based ensemble technique constructed using Recursive Partitioning (RPART) (BREIMAN, 2001).This is a machine learning tool which typically grows according to the methodology of Classification and Regression Tree (CART), as binary splits recursively partition the tree into homogeneous terminal nodes.A good binary split forms a parent node to two daughter nodes with improved homogeneity.This procedure performed in hundreds or thousands of trees, where each tree is made using a bootstrap sample of the original data.
It should be noted that values of ntree and mtry area used for setting and construction of RF. ntree is the number of trees to grow where 1,000 was in the present study.Additionally, mtry is the number of variables randomly sampled as candidates at each split as square root (p) where p is the number of predictive variables.

MODELS PERFORMANCE
Of the 200 windthrows identified, 140 (70%) locations were used for the windthrow hazard maps as training, while the remaining 60 (30%) cases were used for the performance.The Receiver Operating Characteristic (ROC curve) is a graphical plot to assess the performance of a binary classifier technique as its discrimination threshold is varied.ROC is produced by a trade-off between the True Positive Rate (TPR) and False Positive Rate (FPR) at various threshold settings.TPR and FPR are also known as sensitivity and probability of false alarm (1 -specificity), respectively.The Area Under the ROC Curve (AUC) can prepare as global accuracy statistic for the model and it is threshold independent.According to Yesilnacar (2005), the quantitativequalitative relationship between the prediction accuracy and the AUC value can be classified as follows: 0.5-0.6 (poor), 0.6-0.7 (average), 0.7-0.8(good), 0.8-0.9(very good), and 0.9-1 (excellent).
Besides the Sensitivity, Specificity, and AUC, there is also another performance test called Kappa Cohen.

SPATIAL PREDICTIONS
Spatial predictions were made in Arc GIS ver.9.3.Models were exported from R statistical software as a text file and interpreted in Arc GIS by an Avenue script made available with the Rcmdr and randomForest packages.Lookup tables describe each response curve point by point.The obtained pixel values were then classified based on natural break classification scheme in low, moderate, high, and very high classes.

RESULTS
Beech trees had the highest value in windthrow as Table 1.Descriptive statistics of quantitative predictive variables were shown in Table 2.By using logistic regression to assess the power of individual variables in a statistical model, the strongest predictors are wind effect and stand density, followed by soil texture, TWI, and slope degree (Table 3).Although altitude and distance to roads and skid trails are weakly related to windthrow, they may be still important in the generalized linear model (Table 3).All differences in values of the predictive variables between windthrow and non-windthrow points are statistically significant at the 5% level based on and a χ 2 test.Additionally pseudo-R-squared, AIC and the number of fisher scoring were 74.80%, 119.81 and 7 in logistic regression.According to results of current research, Z defines as Equation 4: [4]  The fi nal random forest model for wind damages was constructed using the average of a bootstrap data set.Gini coeffi cient for each predictive variables is presented in Table 4.The model predicts that the probability windthrow is high in sets made on fl oating objects, when wind effect is in a range of 0.95 -1.25 (Figure 2a), latitude 1600 -1750 (Figure 2b), at lower distance to roads and skid trails (Figure 2c), in clayey soil texture (Figure 2d), and when sets are made at ridges of the topographic position index (Figure 2e) with dense forest stands (Figure 2f).Whit respect to forest type (Figure 2g), TWI (Figure 2h), slope degree (Figure 2i) and slope aspect (Figure 2j), the relationship between these predictors and windthrow occurrence was poor or no clear pattern was observed.
ROC plots and kappa coeffi cient were studied to assess the model accuracy in training and validation (Table 5).The comparison of predictive performance between logistic regression and RF indicates that the practical signifi cance of any differences between the models is interesting.The main difference observed between these models was in deviance explained, suggesting that RF model has better predictive performance compared to logistic regression.
Figures 3a and 3b show the spatial prediction of the probability of windthrow occurrence as predicted by logistic regression and RF models.Both models had nearly similar results in the spatial predictions.For i. j.
both models, the highest probability of windthrow hazard was predicted to occur in two regions where is recognized with light and dark colors.Both areas had similar environmental and set data, with low to the high presence of predictor variables.The areal extents of map sub-classes for both models are reported in Table 6.

DISCUSSION
When studying the interaction between windthrow and collection of predictive variables, it is necessary to attend variation aspects of where, when and to what degree an ecosystem is affected (VALINGER; FRIDMAN, 2011).Therefore current study tried to draw a general conclusion in wind damages with great caution.According to our work, wind effect had the most impact on wind damages in both models.Wind effect determines the windy spots based on the terrain attributes (DUPONT, 2016;COUTAND, 2017).Wind characteristics are the first factor that can be effected on vulnerability to windthrow in forest stands.This factor can vary significantly within a specific region (HALE et al., 2015) as produced maps.The topographic factors of these ecosystems are relatively complex as mountainous forest and could locally add to the general occurrence of high winds (SCHINDLER et al., 2012).The study of the windthrow has observed important wind effect variations caused by topography.
Altitude is a crucial physiographic variable associated with wind impact (RUEL, 2000).Thus, it has an important role in windthrow spreading (PELTOLA et al., 2010).The altitude map for the study area was prepared from the DEM and categorized into three classes according to expert knowledge and literature review, whereas wind damages prediction increased by increasing it (> 1,500 m).
There was some important relationship between distance to roads and skid trails and windthrow hazard in both models, especially RF.Therefore, near to roads or skid trails, wind blowing is funneled in the line direction and wind speed is increased (RUEL, 2000).Since a significant concentration of windthrow in study area occurred near to path with the damaging winds, it is likely that wind tunnel creates there (SUVANTO et al., 2016).
The soil texture had a strong influence on windthrow hazard.In clay soil, shear resistance had a high level which led to root growth limitation.For most of the positions found in forest stands, windward roots being uprooted will exist induced by clay texture in comparison with silty-loamy soils (DUPUY et al., 2005).Stokes et al., (2005) reported storm wind built an extensive network of cracks in fine textured soils around the tree during overturning.
In the present research, windthrow has been found to be higher on ridges followed by slopes area.This behavior is not reproduced in Canyons (RUEL 2000).Whereas, the windthrow level was lower in the slopes area and this could account for the fact that its effect was non-significant.
Stands characteristics have a significant role on wind damages.As shown in previous research (DUPUY et al., 2005), the windthrow hazard increases by increasing the number of trees in the hectare.In the dense stands increase interaction between turbulent airflow and trees canopy.Additionally, our study show that stands dominated by beech (pure beech stands) is at much higher risk than beech mixed stands.According to previous studies, the mixed stands have been found to be, in general, less prone to damage than pure stands during the windstorm (VALINGER; FRIDMAN, 2011).It also appears that some species in mixed stands such as maple would be less vulnerable than beech.This species would possess a desirable rooting system and is less prone to decay (SCHELHAAS, 2008).
The forest sites with high TWI can be more prone to windthrow during storms.TWI has a generally potential in windthrow hazard rating systems which root system morphology is being strongly affected by it.However, results are contrary to expectations since wind damages were more impacted than low TWI level.This could probably be explained by an underestimation of wind effect in these regions.
A windstorm is common in mountainous forest, it is important to study the impact of slope degree on windthrow, stability in order to refine our predictive models.
Trees in this study were more resistant to overturning in slope degree > 15°.Analysis of these data using our models indicated that the gentler slopes were attributable to a better root Anchorage (PELTOLA et al., 2010).
In the case of slope aspect, high values for north and northeastern facing slopes show that this category has a positive spatial association with windthrow.By the way, the effect of local and regional dominant winds of Mediterranean Sea in N, and NE of the area is caused to the mentioned facing slopes are prone and susceptible to windthrow occurrence.
This study presented an application of two different models (logistic regression and RF) to predict windthrow hazard.The RF model appeared to be a valuable tool providing reasonable predictions in forest stands.The overall better validation of the random forest technique could be assigned to a significantly higher proportion of correct predictions for windthrow.Modern modeling techniques such as RF can improve model fitting and provide a better prediction for the response variable than the logistic model.Therefore it produces a more accurate map for the hazard susceptibility as the logistic model does.These findings are in agreement with Hale et al., (2015) and Suvanto et al., (2016).

CONCLUSION
Nowshahr forests are the area of Mazandaran province with highest windthrow incidence.Wind damages density has an irregular distribution in space and time.The probability of a windthrow to occur depends on the interactions among the stand characteristics, terrain attributes, soil properties and human variables that affect the spread of windthrow.In this research, the likelihood of windthrow occurrence was predicted using two different methods consisting Logistic Regression and Random Forest.There was found a higher accuracy in RF whereas spatial autocorrelation model residuals and existence nonlinear trends.Unfortunately, at the world level, the lack of detailed information in windthrow occurrences cause natural disturbances such as winds or storms not be used for forest management.The importance of this research is to develop advanced techniques for forest management prior to windthrow in future.Besides the environmental protection, it can prevent huge economic losses in both plantation and natural forests.

FIGURE 1
FIGURE 1 Windthrow locations map with altitude map of the study area (The red and yellow locations were used for training and validation in modeling process, respectively).

TABLE 1
Number and percentage of windthrow in each tree species.

TABLE 2
Descriptive statistics results of predictive variables.

TABLE 3
The most important predictor variables based on logistic regression.

TABLE 4
Gini coeffi cient of variables based on RF model.

TABLE 5
Gini coefficient of variables based on RF model.
FIGURE 3 Windthrow hazard susceptibility map produced by (A)logistic regression and (B)random forest (Each windthrow hazard classes display the probability of windthrow occurrence on that special area).

TABLE 6
Covered area percentage for windthrow zones in sub-classes.