Landslide susceptibility mapping using the statistical method of Information Value: A study case in Ribeirão dos Macacos basin, Minas Gerais, Brazil

: This research study was developed in the Ribeirão dos Macacos basin at the district of Nova Lima, Minas Gerais state, Brazil. The information value statistical method was applied in the construction of the landslide susceptibility map at the 1:25,000 scale. Different partitions of the inventory were tested, as well as different landslide predisposing factors. In the construction of the landslide inventory, the south, southeast and south-west slopes generally present a higher quality in aerial / orbital images due to the position of the sun (lighting direction), which emphasizes the surface structures and it may omit old landslides on slopes facing north, northeast, and northwest. This condition can generate misleading models when using the slope aspect. Another verifi cation was that the models with better Area Under the Curve index will not always restrict the high susceptibility class in smaller areas. This incongruence occurs due to the different curve conformations, since a smaller index curve can present more restrictive results than a larger index curve. The results showed that the model has a high capacity of adjustment to the input data and high landslide predictive capacity.


INT RODUCTION
The expansion process of Brazilian urban centers in recent decades has brought along with it an intense urbanization of areas with greater susceptibility to landslides. The most common natural disasters in Brazil are generally associated with periods of intense and prolonged rainfall (Tominaga et al. 2009). According to the CENAD records -the National Center for Risk and Disaster Management, in the southeast region, where the study area is located, has been the most affected by landslides in the year 2013 (Brazil 2014).
In accordance with the data published in the Material Damage and Losses Due to Natural Disasters in Brazil Report, the country experienced about 9 thousand hydrological disasters, which encompasses mass movements and floods, between 1995 and 2014, thus accounting for damages and losses in the order of 70 billion reais (CEPED-UFSC 2014). Still, according to CRED (2016), between 2006 and 2015, the average death toll was 540 people per year related to this type of disaster in South America.
A refl ection on the importance attributed to this type of disaster are initiatives seeking to increase knowledge and implement policies that reduce the risks to which society is exposed. Internationally, we can mention the Hyogo Frame Work for Action 2005 -2015, promoted by The United Nations Office for Disaster Risk Reduction -UNISDR (Garcia 2012). In the case of Brazil, the Law 12,608 / 2012, which established the National Policy for Protection and Civil Defense -PNPDEC (Brazil 2012) is significant. This Law aims to meet the demand for legal instruments that lead to orderly and sustainable urban development, aiming to prevent and minimize natural disasters. Therefore, this work elaborated an analysis of landslides susceptibility using the statistical method of information value (Yan 1988, Yin & Yan 1988. The models simulated in a Geographical Information System (GIS) were validated from the success and prediction rate curves (Sterlacchini et al. 2011).

SUSCEPTIBILITY TO LANDSLIDES
The landslides susceptibility portrays the occurrence probability of a movement according to the terrain characteristics (Guzzetti et al. 2006, Sobreira & Souza 2012. It is based on the concept that new landslides will occur under the same conditions that have already generated landslides in the past (Guzzetti et al. 1999, Barella et al. 2019) and can be used to predict the geographic location of future mass movements (Chung & Fabri 1999, Guzzetti et al. 2005. It should be noted that the susceptibility is not intended to predict the time or frequency of events, only their spatial location (Guzzetti et al. 2005).
The landslide susceptibility maps should indicate the areas of the terrain with greater and lesser predisposition to experiment a geological process, classified according to the degree of susceptibility (Bittar et al. 1992, Bittar 2014, as well as to inform the typology of the expected mass movement (Aleotti & Chowdhury 1999, Tominaga 2007.
Currently, landslide susceptibility maps use an integrated approach to the physical environment in a geographic information system (GIS), due to the greater capacity of data storage, the possibility of constant updates, and a faster spatial analysis, both qualitative and quantitative (Guzzetti et al. 2006, Barella 2016.

METHODOLOGIES FOR LANDSLIDE SUSCEPTIBILITY ASSESSMENT
Several approaches can be applied in the development of landslide susceptibility maps (Barella & Sobreira 2015). Currently, most of the employed methods consider the algebraic map between the landslide predisposing factors.
Methods of susceptibility analysis are divided into qualitative and quantitative (Soeters & van Westen 1996, Aleotti & Chowdhury 1999. Among the qualitative methods are the Geomorphological Analysis and Heuristic Models, which present high subjectivity since they are directly associated with knowledge by the expert (Guzzetti et al. 1999). The Quantitative Methods are highlighted by the deterministic and statistical models, which seek to standardize the analysis criteria and decrease the professional intervention in the model elaboration.
One of the first statistical susceptibility models with geodynamic processes was developed in California (USA) in the 1970s (Brabb et al. 1972). Since then, the refinement and development of new methodologies and analysis tools began, culminating in several statistical works around the world (e.g. Varnes et al. 1984, Soeters & van Westen 1996, Guzzetti et al. 2012, Chung & Fabbri 2003, Glade et al. 2005, Lee et al. 2007, Corominas et al. 2014, Zêzere et al. 2017, Reichenbach et al. 2018.

Statistical methods
Statistical methods are constructed from the premise that landslide predisposing factors from past events will be correlated with future events (Aleotti & Chowdhury 1999, Guzzetti et al. 1999, Fell et al. 2008, Barella et al. 2019. These methods are indirect and establish spatial correlations between processes and parameters that cause instability (Guzzetti et al. 1999). Therefore, the landslide predisposing factors which led to the development of landslides in the past are statistically calculated and the determination of the landslide susceptibility in currently stable areas is based on quantitative predictions (Soeters & van Westen 1996).
Statistical models are constructed from a sequence of procedures (Aleotti & Chowdhury 1999) and generally use mapping units in matrix format. According to the proposals of Hengl (2006), the ideal resolution (cell size) to be adopted for the 1:25,000 scale should be between 62.5 m (rough resolution) and 6.25 m -2.5 m (fine resolution).
Despite the lower subjectivity when compared to qualitative methods, some of the disadvantages can be usually observed (van Westen et al. 2003): i) the generalization imposed by considering that all the processes located in the study area react to the same combination of conditioning factors and; ii) the simplification tendency of the input parameters, since in general only the more accessible cartography parameters are used.

Study area
The Ribeirão dos Macacos basin is located in the central region of Nova Lima, Minas Gerais, Brazil ( Figure 1). It has an area of 131 km² and altitudes ranging from 730m to 1540m. This region is marked by conflict of interest among environmental issues, since it constitutes one of the main water sources of the Metropolitan Region of Belo Horizonte, which is the capital of Minas Gerais State, while at the same time, it has an expressive concentration of mining activities (Davis et al. 2005).
Geologically, the area is inserted in the region called Quadrilátero Ferrífero (Iron Quadrangle), through the occurrence of ferruginous rock concentrations and by the production of iron ore. The context is dominated by pre-cambrian rocks, represented by two main lithostratigraphic groups: i) Rio das Velhas supergroup, formed by archean greenstones and metasedimentary units of medium to low-grade metamorphism; ii) Minas Supergroup, which consists of proterozoic metasedimentary rocks also of medium to lowgrade metamorphism (Alkmim & Marshak 2008).
The stratigraphy and geological structures control the 3 main morphostructural units in the area. The Curral Ridge, with WSW-ENE alignment at the northern boundary, the Moeda Sincline Plateau, with N-S alignment, in the eastern portion, and the Anticline Valley of the Rio das Velhas corresponding to a fluvial depression excavated along the axis of an anticline, surrounded by elevated synclines (Medina et al. 2005).

Model development strategies
The development of the study requires a series of processes that were developed simultaneously or in an individualized way, as presented in Figure 2.

Landslide predisposing factors
Landslide predisposing factors affecting the stability of a slope are numerous, diverse and can interact in a complex and often subtle way (Varnes et al. 1984). They are static, inherent to the terrain and are correlated with the degree of potential instability of the slope and the spatial variation of landslides susceptibility (Popescu 1994). When considered individually, they do not originate landslides, but only act as catalysts of dynamic factors (e.g. rainfall) (Glade et al. 2005).
These factors are correlated to the physical environmental characteristics such as topography, geology, soils, hydrology and geomorphology. Topography is one of the most relevant factors in the analysis of susceptibility to landslides (van Westen et al. 2008, Corominas et al. 2014, and can be considered the main source of information used in the construction of forecasting models. Therefore, diverse cartographic subjects derive from the topography, like slope angle, slope aspect, slope curvature, among others. The cartographic database included the geology, the geomorphology and the pedology (Minas Gerais 2005) in a 1: 50,000 scale, in addition to maps derived from the topography on a scale of 1: 25.000: slopes angle, slopes aspect and slope curvature ( Figure 3).
The slope aspect was also addressed in the study, but was not used in the final model. In the inventory construction, it was noticed that the S, SE and SW slopes generally presented a better representation of ancient landslides due to the position of the sun (lighting direction), which emphasizes the superficial structures, in  a way that cataloged more landslides on these slopes. For this reason, the slope aspect was excluded from the final model, because it ended up overestimating the information value of the slopes facing N, NE and NW.
The geology (Silva 2005) was grouped into 13 lithological units (UL), based on observations or stability inferences, as suggested by Varnes et al. (1984). Still, in cases that the mapped body had reduced dimensions without significant expressions on the surface, it chose to insert it in the juxtaposed unit of greater geotechnical similarity. The 8 geomorphological units (UGM) are based on data from Medina et al. (2005) and were refined and/or grouped based on remote sensing.
The pedological units (UP) are based on Shinzato & Carvalho Filho (2005) and were grouped into 8 pedological units according to the order classification proposed by the Brazilian Soil Classification System (Embrapa 2006).
The slope angle variable classes were defined by "trial and verification", where intervals of 2°, 5°, 10° and 15° were tested, besides nonstandard intervals defined qualitatively. Finally, the 5° interval returned the greater Area Under the Curve (AUC) when the method of information value was applied.
The derivation of the slope profile map, also called the curvature map, was produced by combining the transverse and longitudinal profiles decomposed into concave, convex and linear forms. Through these combinations, 9 morphological classes were generated, which allowed to infer the control of the slopes on the water movement and soil moisture content. This process may influence the distribution of vegetation and the occurrence of geological processes (Wysocki et al. 2011).

Landslide inventory map
The landslide inventory maps are essential information for mapping susceptibility to landslides (Fell et al. 2008). During its elaboration, the old landslides morphologically visible in the area are mapped in the form of points or polygons (Parise 2001, Oliveira 2012). The construction of landslide inventory maps can be performed by a variety of methods, such as stereoscopic analysis of aerial photographs, field geomorphological cartography, engineering geology investigations, remote sensing techniques, and compilation of data in historical archives (Guzzetti et al. 2000, Guzzetti 2005.
Several factors influence the accuracy of the inventory such as i) scale, date, cloud presence and resolution of aerial photographs or remote sensor images; ii) the type, scale and quality of the map used to present information about the landslides; iii) the tools used in the interpretation and analysis of the images; and iv) the knowledge and experience of the performer on the image analysis and; v) the interference caused by the light and shadow relation on the images, due to the angle of sunlight incidence during the image collection (Guzzetti et al. 2012, Rogers & Doyle 2003. The landslide inventory maps were constructed with remote sensing techniques in orbital and aerial images, from which polygons were traced, thus delimiting the landslide features found. Topography was also used in the delimitation of landslides from the concepts proposed by Rogers & Doyle (2003). Field campaigns were conducted at specific sites and selected according to qualitatively defined areas in remote sensing. The objective was to calibrate the photointerpretation process and to validate the inventory produced in situ.
In total, 313 features left were identified by ancient landslides (Figure 3f), which were randomly divided into two groups, 157 training and 156 test. The typology of these movements was not discriminated, although it is known that there exists a predominance of translational landslides and a smaller number of rotational landslides.
The features cataloged on the landslide inventory map cover about 0.2% of the Ribeirão dos Macacos basin and are heterogeneously distributed in 3 groups of higher density. These groups of higher density are located in the eastern portion, possibly associated to the Moeda Sincline Plateau, in the northern part, possibly associated to the Curral Ridge, and South-Central, which shows no apparent correlation with any morphostructural unit.

Information value method
The information value method (Yan 1988, Yin & Yan 1988 combines in a bivariate statistical approach, the spatial distribution of landslides (dependent variable) with the classes of each landslide predisposing factor (independent variables), weighing their importance based on their respective density of instabilities (Soeters & van Westen 1996).
The evaluation by this method is divided into 2 stages (Yan 1988, Yin & Yan 1988. In the first step, the weight of each class and each landslide predisposing factor is calculated from its intersection with the mass movements cataloged in the landslide inventory through the expression: (1) Where I i is the informative value of the variable xi; Si is the number of mapping units (grid-cells) with landslides of type y within variable xi; Ni is the number of mapping units with variable xi; S is the total number of mapping units with landslides of type y; and N is the total of mapping units in the study area.
In the second stage, the susceptibility is estimated through the summing of the informative values: (2) Where m is the number of variables; X ij is equal to 0 or 1 whether the variable is present or not in pixel j.
In practice, the informative value method compares the density of landslides in each class of conditioning factor with the mean density of the total area, applying the logarithmic transformation so that there is an increase in the numerical amplitude, with values ranging from -∞ a +∞. In this context, positive values are considered to be influential on landslides, while negative values have a low influence on the development of landslides. The degree of importance for these values is related to their numerical magnitude (Yin & Yan 1988).
The application of the information value method was performed with the ArcGIS 9.3 software from integrated maps data through the algebra tool map. The raster integration was performed one by one, following a preestablished order according to the Sensitivity Analysis.

Sensitivity analysis
The importance quantification of the factors involved in the landslide process is called "Sensitivity Analysis" and it is a tool aimed at the identification of the individual abilities for each landslide predisposing factor and the hierarchization of these parameters, in order to generate more robust combinations that produce higher quality susceptibility maps, thus decreasing the volume of data used and making the data processing less complex (Zêzere et al. 2005).
For this, the Accountability (A I ) and Reliability (R I ) indices (Greenbaum et al. 1995, Meneses et al. 2017 were used, as well as the Area Under the Curve parameter (AUC) associated with the informative value.
The index A I counts how much the different classes of landslide influencing parameters are relevant to the analysis due to the landslide features contained therein, while the R I is based on the landslides average density of each predictor variable class to define its relevance. The indexes A I and R I are calculated by equations 3 and 4 (Meneses et al. 2017): (3) Where k is the area with landslides in classes with conditioned probability values greater than the considered probability; N is the total landslide area; y is the area of each class of the independent variable with conditioned probability above the considered probability.
The AUC was proposed as a validation tool for statistical models of susceptibility and is widely used in the technical literature. It can be used in the sensitivity analysis when linked to a statistical model, allowing for the adequacy identification of each predictor variable (Zêzere et al. 2005, Piedade et al. 2010. After the calculation of the AUC, A I and R I indices, the position of each landslide predisposing factor (1 to 5) was defined. The order of integration established involved the arithmetic mean of the three calculated indices hierarchy, thus establishing the importance degree of each parameter on the process investigated. In total, four landslide susceptibility models were produced with the progressive integration of these conditioning parameters according to the order of integration established.
Previously built models have returned a myriad of informational values that needed to be classified. Thus, through the prediction curves, there was a zoning of the landslide susceptibility in three classes, that is, a high susceptibility class to landslides, which should predict 85% of landslides, a class of average susceptibility to landslides, which should predict 10% of the movements, and a low landslide susceptibility class with a predictive capacity of 5%.

Validation
The evaluation of the model predictive capacity is an essential step (Beguería 2006). For Frattini et al. (2010), three basic criteria must be met in order for a model to be acceptable: i) conceptual and mathematical adequacy in describing the behavior of the natural system; ii) robustness to small changes in the data base; and iii) accuracy of recorded data. As it is expected, the model is not perfect and it is necessary to know its degree of confidence (Remondo et al. 2003).
Validation can be understood as a test on the ability of the model to reflect the real environment, evaluating its accuracy and predictive capacity (Beguería 2006). Studies without some type of validation do not present any scientific value and since it is not feasible to wait for new events to verify the capacity of the model, the inventory is divided, so that one part is used in the construction of the model (training group) and another in the evaluation of the results (test group) (Chung & Fabbri 2003).
For the construction of the landslide susceptibility models, only 50% of the scars scheduled were used. At the moment of validating the produced models, success curves that use the same portion of the inventory were used in the construction of the models, as well as the prediction curves, which use the inventory portion not used until the moment. The areas under the success and prediction rate curves were determined in order to facilitate the verification of the results and to apply a numerical value to the graphic constructions.

Prediction and success rates
The success and prediction rate curves result from the integration of accumulated percentages arranged in descending order between the susceptibility indexes generated by the model and the sites considered unstable by the landslide inventory (Chung & Fabbri 1999). These graphs are presented in terms of percentage study area in descending order of susceptibility in the abscissa axis, against the accumulated distribution of landslides in the ordinate axis (Oliveira 2012). Therefore, the curves express the fraction of the area required to justify a certain percentage of landslides (Garcia 2012). For the numerical interpretation of these curves, the calculation of the AUC was used (Garcia et al. 2007). The difference between the success and prediction curves is in the portion of the inventory used (Sterlacchini et al. 2011). In the success curve we have an evaluation of the result between the model and the data that produced the model. The prediction curve represents the ability of the model to predict a future event in an undefined period of time (Zêzere 2006, Chung & Fabbri 2008, since different data from those used in the model construction are used. For Barella (2016), similar success and prediction curves may denote accurate landslide inventories, further that data division between training and test groups was adequate.

RESULTS AND DISCUSSIONS
Five landslide predisposing factors were used in the construction of the model: slope angle, geomorphology, soil, geology and slope curvature. In relation to the inventory, 3 different partitions were performed, for which the whole procedure of the statistical analysis were executed, so that in the end the use of a partition that presented the most accurate results and the smallest difference between the success and prediction curves was chosen.
The information values index of each landslide factor, which were the basis for the map algebra, are shown in Table I. It is noteworthy that the area below the curve of the slope angle and the geomorphology parameters presented an AUC above 0,8.
Analyzing the AUC and IV indexes, it is possible to notice that the most influential parameters in the landslide development along the study area correspond to the slope angle between 25° and 75°, generally associated with geomorphologic units of scarps, ridges and spurs. It should be noted that the units with slopes higher than 50°, although they have very high IV indexes, are regions of limited scope, not exceeding 0.5% of the total area.
The parameters used in the Sensitivity Analysis (A I , R I and AUC) and the hierarchy of the values are shown in Table II. From the mean reached by each predisposing factor, from the lowest to the highest, the order of integration was defined.
The map algebra used in the application of statistical techniques was performed with successive addition of each landslide predisposing factor, according to the order of integration achieved in the sensitivity analysis (Table II). Considering that 5 factors were employed and the sum is performed factor by factor, we have the execution of 4 analyzes, according to Table III. For each of these analyzes, the validation of the model was performed, with the calculation of success and prediction rates. On Table IV, it can be seen that both rates rise with the continuous insertion of the landslide predisposing factors, and the difference between them tends to increase with each inclusion, except for the Analysis 2. Analysis 4, which derived from the combination mapping of all the predictive variables employed, presented the highest success and prediction rates as well as the greatest difference. However, this simulation configures a good quality model, since the magnitude of this difference is very low.
The definition of the susceptibility classes was based on the landslide occurrence. In the classification utilized, the percentage of landslides was grouped in 85%, 10% and 5%, referring to the high, medium and low susceptibility classes, respectively (Table V). Thus, for example, it is expected that 85% of the landslides will occur in areas defined as high susceptibility.
From this point, the variation of the indices observed in analyzes 3 and 4 is highlighted, where there is an increase in the area classified as high susceptibility. This indicates that, although the AUC increased, the model did not present, as expected, a restriction in the definition of the most susceptible areas. This incongruence occurs due to variations in the shape of the curve, which the high susceptibility class of Analysis 3 has a punctual widening over the same class in Analysis 4. In Figure 4, it is possible to observe the behavior of the success curve in the proximity of the prediction ratio of 85%, which would justify It should be noted that the slope aspect was also analyzed. However, during the construction of the inventory, there was a great tendency of cataloging slides positioned on slopes facing S, SE and SW, mainly the older ones, with some type of vegetation coverage already established (Table  VI).
In fact, there is a difference in these groups, since the slopes facing south, southeast and southwest have an average slope greater than the average of the other areas (Table VI), and in view of this finding, we sought to identify the factor which controls this trend.
Since geological structures were not included in the statistical analysis, we decided to carry out a visual analysis looking for cataclinal slopes, with penetrative discontinuity dips in the same direction as the slope (Cruden & Hu 1998, Cruden 2000. However, this visual inspection did not show sufficient indications revealing the influence of the discontinuities on the investigated process. Therefore, the decision to analyze the insolation direction was taken since it can influence the vegetation distribution and soil moisture content (van Westen et al. 2008, Corominas et al. 2014, as well as contribute to the quality of aerial images/orbitals (Rogers & Doyle 2003). Regarding the vegetation, no trend was observed related to the direction of the slopes and, in relation to the moisture content, no studies were carried out. Regarding the aerial/orbital images, it was verified that the slopes facing south and adjacent, always have a superior response with respect to the retraction of ancient landslides.
Although two field verification steps were conducted, the inventory was mostly based on remote sensing, since extensive areas are private properties, mainly condominiums and mines, and the accesses are scarce in several places. In view of these observations, we decided to remove the slope aspect variable due to the tendency of the inventory to attribute great weight to the south facing slopes and/or adjacent directions, making the slopes facing north, northeast and northwest present pixels with more attenuated informative values, whereas the slopes facing south, southeast and southwest, prevailed with high indices of informative value ( Figure 5).
The final model integrates the predisposing factors geomorphology, slope angle, soil, geology and slope curvature. In areas with high susceptibility, they are concentrated in the northwest, west and center-south portions of the terrain, thus occupying 14.5% of the territory ( Figure 6). (1, 2, 3, 4 and 5) -Hierarchization of the Parameters. Table III. Order of integration of the landslide predisposing factors.

ID Predisposing factors
Analysis 1 Geomorphology + Slope angle

CONCLUSIONS
The susceptibility map shows the predisposition of the land in developing certain geodynamic processes and it configures itself as a basic tool for the execution of territorial planning, when it is aimed at the prevention of catastrophes and it acts as an important tool to attend the demands of the law 12,608 (Brazil 2012).
The use of statistical analysis through the informative value method was efficient, with high predictive capacity and low cost for execution. The possibility of revising the data during the study execution process is another notorious feature, since the model can be refined with each acquisition of new information. It should be noted that this method demands a diversified cartographic database and in a reasonable scale, which does not occur in most of Brazil. The Area Under the Curve index, although effective and able to portray, in general, the robustness of the models produced, may not lead to the selection of the most efficient simulation. This occurs due to the different conformations that the curve can take along its path, since there may be alternations between curves produced by models with different AUC. Therefore, a higher prediction rate cannot guarantee a greater constriction of the zones of high susceptibility, not necessarily leading to the choice of the best statistical forecast model.
The slope angle, as observed by Corominas et al. (2014), had a great correlation with the landslides, but attention should be paid to the importance of the other cartographic bases. The geomorphology used in the study presented a very high predictive index, and in itself, it would allow for the construction of a good predictive model.
The landslide inventory is a key piece for the quality of the model, since all landslide predisposing factors will be attached to it. The identification and delimitation of events have a subjective feature, since it is influenced by the knowledge and experience of the responsible professional, besides being associated with the accuracy of the topographic model, the resolution and the representativeness of the images or even the morphological alteration of the landslide surface due to the weathering action and the vegetation growth.
The position of the sun during the acquisition of the images exerts a great influence in the construction of the landslide inventory, highlighting or omitting surface structures, depending on the incidence angle of the light and can misrepresent the final model, especially when using the slope aspect. Therefore, the use of a diversified image base and the execution of field checks can give more credibility to the inventory and avoid misunderstandings in the predictive model. This is an important issue and needs further investigation studies.