Cluster analysis for slope geotechnical prioritization of intervention for the Estrada de Ferro Vitória-Minas

This article proposes the geotechnical prioritization of intervention of slopes with landslide scars for the Estrada de Ferro Vitória-Minas by cluster analysis and also the proposition of a relationship between area and volume in landslide scars. Cluster definition helps the decision-making associated to containment measures, mapping and study of landslides for the Estrada de Ferro Vitória-Minas. The database is composed of the variables: slope’s height, inclination, scar area and scar volume. The distance measure used was Gower’s index, with Ward’s methods to build the clusters. Eight characteristic groups were identified. It was possible to identify stretches that need attention in relation to the propensity of landslides, such as Group 7, stretches 362+600, 093+xxxE and 419+000. Group 7 presented high values for the scarred area and volume, such as maximum area 9.75 x 104 m2 and minimum area 7.49 x 104 m2, and maximum volume 9.20 x 105 m3 and minimum volume 4.08 x105 m3. Group 7 presented high ranges for slope height and inclination. The set of results about Group 7 can be interpreted as stretches with a predisposition for landslides. In relation to intervention measures, Group 7 presents the sections with priority. The relationship between area and volume of landslide scars obtained by the research was compared with the relationships established in literature.


Introduction
Predicting the area and volume of rock mass in landslides is a matter of extreme complexity.Some cases allowed equating landslides in different parts of the world.The empirical relationships allow estimat-ing parameters of landslides as a function of characteristics that can be determined, such as slope inclination and height.The empirical relationships allow the assessment and delimitation of areas of risk, allowing conviviality with natural phenomena.In addition, they also allow the preparation of programs to prevent, alert, and implement containment construction (Polanco, 2010).
This study was developed taking as reference the railroad line of the Estrada de Ferro Vitória-Minas (EFVM), Vale S.A., which is inserted in the states of Minas Gerais and Espírito Santo.The railroad has 929 km of extension and the trajectory of the trunk line is 540 km.The geology that covers the study area, includes dominant lithologic classes in the trunk line of the EFVM, such as granites, gneisses, gneisses with mafic intrusives, metasedimentary rocks with presence of granitic intrusions, shales and gneisses and unconsolidated sediments.
In 1997, the company Vale S.A.
received, through a contract signed with Brazil's government, the concession for the complete operation of its rail transportation services.EFVM handled 119.2 million tons of ore, in addition to other cargo transported to other companies, such as coal and agricultural products.In relation to passenger transport, EFVM transported almost one million people in 2014.(Vale, 2017).The present article proposes to identify clusters in the slopes with scars in a specific section of the Estrada de Ferro Vitória Minas; this clustering allows to interpret the sectorization of the railroad and prioritization of intervention on slopes with landslides.The prioritization will be based on slope height, inclination, scar area and volume variables analyzed by cluster analysis.The cluster's definition helps the decision-making associated to containment measures, mapping and study of landslides for the Estrada de Ferro Vitória-Minas.In this context this work can contribute to studies about landslides in Brazil, through the proposition of empirical relationships for landslide control estimates.

Materials and methods
The database used in this article was compiled from the data of Gomes' work (2014) in conjunction with image analysis of the Google Earth Pro software and field work in the studied stretches of the railway.
The variables collected on the slopes are slope's height (HT) and inclination (IT).These variables are qualitative, classified by Gomes (2014) in bands presented in Tables 1 and 2.  The scar's area and scar volume variables are quantitative and were estimated in fieldwork in conjunction with the use of Google Earth Pro software images and tools.Figure 1 shows the 493+600 stretch with a slide scar, field image and Google Earth Pro image.The processing of database and cluster analysis, along with the final script was performed in software R, R Core Time (2016), version 1.0.136.
Ward's method presented better suitability for the database.According to Ward (1963) Ward's method is to use the individuals themselves to construct the overall measure of group heterogeneity (W), that is, the method assumes that each individual forms a group.W represents the sum of the squared Euclidean distances between each individual, i, and the mean of their corresponding group, g, see Equation 1. (2) The database is composed of quantitative and qualitative variables, mixed variables, and therefore for the distance measurement of similarity, used was the Gower distance.Gower's distance was defined by Gower (1971) and is presented in Equation 2.
G jk -Gower's distance between the individuals j and k W ijk -Variable weight i between the individuals j and k S ijk -Partial similarity of variable i between the individuals j and k

Results and discussions
In order to analyze the relationship between the variables to obtain the parametric relationship, a regression was performed between the two variables: scar's area and volume.Figure 2 presents the result.

Figure 2
Relation between variables: scar's area and volume.
The correlation coefficient between the scar's area and volume variables was 0.66.Equations 3 and 4 present the relationship between the two variables, which was obtained by means of the graph presented in Figure 2. log(V)=0.7338log (A) 1.2249   (3) (4) V =0.1416(A) 1.2237   Equations 3 and 4 can be compared with the relationships proposed by Simonett (1967), Rice et al. (1969), Innes (1985), Guthrie and Evans (2004), Korup (2005), Imaizumi and Sidle (2007), Guzzetti et al. (2008) and Imaizumi et al. (2008).The main differences between the relationships may be due to the local geology, the data collection period, and the number of slopes stud-ied.In order to calculate the similarities between the slopes, the Gower's distance was used and to create the grouping, the Ward's method was used.The cophenetic correlation in Ward's method was 0.68, presenting an adequate value for the research.The choice of number groups was helped by the analysis of distance chart in groups and by the dendrogram analysis presented in Figure 3.For analysis of the dendrogram, presented in Figure 4, with the characteristics of the slope's scars, eight groups were selected for the studied stretches of the EFVM.
For the results of the cluster analysis, in relation to the quantitative variables, scar's area and scar's volume, the cluster technique was able to separate the area and volume bands characterizing each group.Figures 4  and 5 show the boxplot of the variables with the bands comprising each group for the area and volume variables, respectively.When analyzing only the quantitative variables, it was observed that this technique was able to identify ranges for the scar's area and volume.In the case of the scar's area variable, the bands in some cases overlap; however, it does not happen in the whole set values because each track presents unique values.For the volume, it is possible to observe that bands differ as to maximum values, not causing differences in the minimum values, except for Group 7.
Analyzing the final results of clustering in the whole database, qualitative and quantitative variables, it was possible to build  Among the grouped stretches, Group 7 stands out as the group that needs the most care in relation to the landslide development.In group 7, stretches 362 + 600, 093 + xxxE and 419 + 000 present high values of scar area and volume with high ranges of slope height and inclination, which can be interpreted as areas with predisposition for landslide occurrence.In cases of intervention measures, these would be the stretches that should be given priority.
Other stretches that need attention are in Group 3. Group 3 has seven slopes with high values for slope height and inclination, as well as ranges of values for the scar's area and volume.The stretches in Group 6 present average values for slope height with high slope inclinations, and when compared with the groups, stands out presenting high values for the scar area and volume variables.
The geographical aspect was analyzed, and allowed to sectorize the stretches in EFVM by cluster analysis.The groups formed are identified by characteristics of stretches in the group.Table 5 presents the stretches of EFVM with the results of cluster analysis.

Conclusions
The application of cluster analysis allowed the identification of eight groups with intrinsic characteristics.In the classified groups, there can be identified stretches of group 7, such as 362 + 600, 093 + xxxE and 419 + 000 with a tendency for landslides, due to the high values for slope height and inclination, along with considerable dimensions of scar area and volume.
Other groups can be cited, groups 3 and 6, with slope height and inclination values and considerable scar area and volume values.

Figure 1
Figure 1 Partial view of landslide's scar in stretch 493+600 of EFVM (left).Image of the year 2016 from Google Earth Pro of stretch 493 + 600 with polygon marking the scar (right).

Table 2 Slope
's inclination variable definition.
Table 3 presents the relationships cited and the relationship found in this research, and it is possible to conclude that the relationship is in agreement with literature.

Table 4 ,
which presents the final characteristics of each group.

Table 4
Final characteristics of groups.