Acessibilidade / Reportar erro

Methodology for risk management in dams from the event tree and FMEA analysis

Abstract

Some studies that analyze the risk of dam failures estimate that between 2016 and 2025 about 30 major tragedies should be expected. Failure records between 1900 and 2014 indicate that there is an average of three ruptures every two years, considering only the failures that were officially registered and investigated. It can be said that the potential for dam failures will be driven by the economy, since cost has been the main variable considered in the design, construction, operation, monitoring and closing plan of these structures. As companies reduce investments in maintenance, risk management and failure prevention, there is an incentive for economic recovery, competitiveness of product value and debt reduction, required by investors. The result has been a decrease in specialized labor, to the point that companies no longer have sufficient knowledge about the engineering and operational skills that apply to tailings and water management. Learning from the dams’ tragedies is practically non-existent, in Brazil and worldwide, leading to catastrophic environmental and social consequences. Failures will occur as long as they are viewed and treated as unpredictable, thereby lacking risk management. The proposed risk management method, presented in this paper, considers the information of inspection and instrumentation, identifying risks from event trees, separately, intolerable, tolerable and acceptable risks. The intolerable risks are conducted for FMEA-type failure analysis, where severe, intermediate and mild failures are assessed. The objective is to enable the development of an assertive and effective action plan for dam safety management.

Keywords
Dam; Failures; Risk management; Failure management

1. Introduction

The dam rupture scenario has been more frequent than expected worldwide, with an average of two events per year, even if new regulatory and inspection measures are implemented. The main causes are related to the deficiency of geological-geotechnical investigations, hydrological design and management systems, causing deaths, economic loss and, usually, irreversible environmental devastation. In this sense, the regulations on inspections are part of the process and, by themselves, do not constitute a guarantee of safety, being essential the management of routines in operation and maintenance, according to Fernandes (2020)Fernandes, R.B. (2020). Methodology for risk management in dams from the Event Tree and FMEA Analysis [Doctoral thesis, Rio de Janeiro University State]. Rio de Janeiro University’s repository (in Portuguese). Retrieved in May 1, 2021, from http://www.labbas.eng.uerj.br/pgeciv/nova/files/teses/16.pdf
http://www.labbas.eng.uerj.br/pgeciv/nov...
.

Dams provide many benefits for society, but the floods resulting from the disruptions produce devastating scenarios, since the extent of flooding is large and places the population downstream in a risk zone. Of an immense number of dams that have failed over the years, according ICOLD (2001)International Commission on Large Dams – ICOLD. (2001). Tailings dams risk of dangerous occurrences: lessons learnt from practical experiences (Bulletin, No. 121). Paris: ICOLD., there are three ruptures that contributed significantly in terms of the number of victims, Vajont in Italy in 1963, with 2.6 thousand of victims, Johnstown in Pennsylvania in 1889 with 2 thousand and Machhu II in India in 1974 with 2 thousand of victims. Costa (1985)Costa, J.E. (1985). Floods from dam failures. Reston: U.S. Geological Survey. reports that the average number of deaths in dam ruptures is 19 times higher when there is no warning system in place.

A good integrated geotechnical risk management system must consider the involvement of people, with high-performance, qualified and dedicated teams. In relation to the processes used, they need to contain safety management elements, when the operation, maintenance and emergency management routines are established, defining which guidelines to be followed for each emergency level identified in risk situations. In addition, detailed risk management must be based, generally supported by information systems that assist in data control.

The lack of commitment by the company's top management is usually noticeable through the implementation of inadequate management procedures. There is no efficient management without the support of physical and financial resources, and the application of good practices is only achieved if supported by the top administrative categories. For the tailings dams, the scenario is expected to be even worse, since the waste constitutes a rejected portion of resources, which does not contemplate direct financial return on the actions linked to its disposal. There is a continuous tendency to reduce costs to a minimum, reducing staff, not conducting research and not investing in monitoring and security. Total quality programs are prioritized for the product and not for the situation of reservoirs and containment structures. However, in the event of disruptions, the costs of damages and reparations, the loss of prestige with society, and the reduction of the company's market value, are much greater than the savings made by neglecting good techniques and practices.

2. Risk and failure management

The issues related to the design, construction, operation and maintenance of dams are very specific and depend on variables that must be thoroughly analyzed and evaluated over the life of the structure.

The dangerous condition is a situation with the potential to cause human or environmental damage, and the dangerous event involves a danger and leads to disastrous consequences. The risk, on the other hand, is the combination of the probability of occurrence and the consequence of a dangerous event, being a function of the severity and frequency of a given situation. In other words, according to Whitman (1984)Whitman, R.V. (1984). Evaluating calculated risk in geotechnical engineering. Journal of Geotechnical Engineering, 110(2), 143-189. http://dx.doi.org/10.1061/(ASCE)0733-9410(1984)110:2(143).
http://dx.doi.org/10.1061/(ASCE)0733-941...
it is the relationship between danger and consequence, being danger defined as the temporal probability of the occurrence of a threat and, consequently, the composition of vulnerability to risk elements, exposure and utility of elements to risk.

Within the risk analysis it is essential to analyze all possible failure modes in order to determine the probability of the occurrence of each scenario. The objective of the risk analysis is to obtain the probability of rupture or failure of the dam, for each failure mode, identifying the most critical paths, that is, the probabilistically most favorable events of occurrence. Companies must give guarantees to society regarding the operation of waste and water deposits, defining tolerable risk limits.

Deterministic analyzes evaluate the nominal case (a scenario), without considering the entire range of plausible results, and do not quantify the probability of the result. Probabilistic analysis, on the other hand, identifies the uncertainties that are fundamental to security and tries to include all plausible scenarios, their probability and their consequences. Generally, this type of uncertainty condition is represented by a normal statistical distribution, and the methods consider a mean, a standard deviation (SD) and a coefficient of variation, represented by the ratio of the standard deviation and the mean (coV = SD / mean), as noted in Figure 1 (Lacasse et al., 2019Lacasse, S., Nadim, F., Liu, Z.Q., Eidsvig, U.K., Le, T.M.H., & Lin, C.G. (2019). Risk assessment and dams: recent developments and applications. In Proceedings of the XVII European Conference on Soils Mechanics and Geotechnical Engineering (Vol. 1, pp. 1-26), Reykjavik Iceland. https://doi.org/10.32075/17ECSMGE-2019-1110.
https://doi.org/10.32075/17ECSMGE-2019-1...
).

Figure 1
Shear tension in the normal distribution. Source: Lacasse et al. (2019)Lacasse, S., Nadim, F., Liu, Z.Q., Eidsvig, U.K., Le, T.M.H., & Lin, C.G. (2019). Risk assessment and dams: recent developments and applications. In Proceedings of the XVII European Conference on Soils Mechanics and Geotechnical Engineering (Vol. 1, pp. 1-26), Reykjavik Iceland. https://doi.org/10.32075/17ECSMGE-2019-1110.
https://doi.org/10.32075/17ECSMGE-2019-1...
.

According to Londe (1995)Londe, P. (1995). Safety concepts applied to rock masses. In C. Fairhurst (Ed.), Analysis and design methods: principles, practice and projects (pp. 749-769). Minnesota: University of Minnesota. https://doi.org/10.1016/B978-0-08-040615-2.50035-6.
https://doi.org/10.1016/B978-0-08-040615...
, the safety margin M, which is our safety factor (FS), is obtained when the resistance load is subtracted and, necessarily, must be greater than or equal to zero for a safe condition of project. In Figure 2 the safety margin condition considers the probability of failure (Pf), defined as the potential overlap of load and resistance uncertainty distributions, which results in a failure probability. In terms of resistance, the safety factor is the ratio between the resistant and active moments, with values greater than 1.0 representing a greater structural capacity to resist the instabilizing forces, according to their proportionality.

Figure 2
Safety margin considering the probability of failure. Source: Londe (1995)Londe, P. (1995). Safety concepts applied to rock masses. In C. Fairhurst (Ed.), Analysis and design methods: principles, practice and projects (pp. 749-769). Minnesota: University of Minnesota. https://doi.org/10.1016/B978-0-08-040615-2.50035-6.
https://doi.org/10.1016/B978-0-08-040615...
.

Thus, it is to be expected that, for each situation, given the particularity of each project, there is a safety margin and very different failure probabilities, for small and large uncertainties, as shown in Figure 3.

Figure 3
Small and large uncertainties in the safety margin. Source: Londe (1995)Londe, P. (1995). Safety concepts applied to rock masses. In C. Fairhurst (Ed.), Analysis and design methods: principles, practice and projects (pp. 749-769). Minnesota: University of Minnesota. https://doi.org/10.1016/B978-0-08-040615-2.50035-6.
https://doi.org/10.1016/B978-0-08-040615...
.

The legislation of most countries requires the safety factor of a dam to be greater than 1.3 or 1.5. In reality, the safety factor is not the most relevant criterion for ensuring the safety of the dam since it represents a spectrum of probability of failure. A dam with a safety factor of 1.4 may be less vulnerable than one with a safety factor of 1.79, but with a higher level of uncertainty, resulting in a greater probability of failure.

Numerically, a higher safety factor value does not necessarily mean a greater safety margin and the reliability index (β) must be calculated, SD being the standard deviation. However, the uncertainties in the parameters that define the safety factor are not an exclusive influence on the final safety of the dam. Other aspects that are not accounted for in the safety factor are the quality of engineering, construction and operation. A good project, careful execution and an operation following international recommendations are factors that directly influence the risk of the dam, in a positive way. In this sense, the reliability index (β) and the failure probability (Pf) can be related, assuming a normal distribution of the failure probability, where the higher the value of β, the lower the probability of failure (Figure 4).

Figure 4
Reliability index x Probability of failure. Source: Lacasse et al. (2019)Lacasse, S., Nadim, F., Liu, Z.Q., Eidsvig, U.K., Le, T.M.H., & Lin, C.G. (2019). Risk assessment and dams: recent developments and applications. In Proceedings of the XVII European Conference on Soils Mechanics and Geotechnical Engineering (Vol. 1, pp. 1-26), Reykjavik Iceland. https://doi.org/10.32075/17ECSMGE-2019-1110.
https://doi.org/10.32075/17ECSMGE-2019-1...
.

In this way, raising the discussion of uncertainties always leads to a better understanding of what is important for the project, safety assessment and performance monitoring, related to the acceptable or not FS values. Issues related to what the safety objective should be during the life of a dam, as well as assessments of whether deterministic fixed safety factors are appropriate or not to ensure the same level of safety throughout the life of the dam, should be constantly evaluated. A dam in operation for 50 years represents at least 50 years of experience evaluated under operational and environmental loads and the uncertainties at the time of the design and construction will have changed over time. In this sense, an annual failure probability, based on the performance of the structure and monitoring and inspection data, allows a more consistent comparison of the safety level at different times in the life of a dam than a pre-established and fixed safety factor.

2.1 Risk analysis

Some companies accept risk passively, and others create competitive advantage by exposing them to risks in a prudent and reasoned manner. The definition of risk includes the possibility of loss, damage, disadvantage, negative impact, danger or threat of a specific event. There is no zero risk and all activities involve a certain degree of risk, which must be understood and managed, so that it is minimized to the maximum. The concept of risk has acquired wide social and industrial prominence, constituting an operational concept widely used in engineering and management. According to ABNT (2018)ABNT NBR 31000. (2018). Risk Management: Guidelines. ABNT - Associação Brasileira de Normas Técnicas, Rio de Janeiro, RJ (in Portuguese)., it is associated with an event, being a quantity that results from the combination of the probability and the severity of consequences due to potential failures.

Risk management must be developed in stages, based on the principle of knowing what type of risk is being considered. One step to follow, risk analysis, consider the possibility of identifying the threat and its causes, as well as estimating the risk according to the severity of the damage and the frequency of occurrence. The risk assessment identifies or can be done to reduce the risk situation, followed by the control stage, where the procedures for reduction and mitigation of the event are designed and implemented. Finally, make an analysis of the results obtained, evaluate as previous steps and check if the model is satisfactory. There are several methodologies developed for the elaboration of risk management and the choice is made according to the author's preference.

3. Proposed method

Risk assessment, as a whole, is the process in which quantitative or qualitative risk estimation is considered, along with all social, environmental, temporal and other aspects, assessing the consequences of a failure and determining an action plan to mitigate or accept the risk. This analysis must necessarily be performed by specialists in several fields, such as geologists, geotechnical engineers, and hydrologists, hydraulic and structural engineers, among others.

Risk management has been widely used in the industry since the 1960s, according Kloman (1992)Kloman, H.F. (1992). Rethinking risk management. The Geneva Papers on Risk and Insurance - Issues and Practice , 17(3), 299-313. http://dx.doi.org/10.1057/gpp.1992.19.
http://dx.doi.org/10.1057/gpp.1992.19...
, but it was only in the late 1980s that the concept was incorporated into the decision-making process related to dams. However, dam safety management must be very specific, differentiated for each structure, region, country and, mainly, enterprises.

According to Fernandes (2020)Fernandes, R.B. (2020). Methodology for risk management in dams from the Event Tree and FMEA Analysis [Doctoral thesis, Rio de Janeiro University State]. Rio de Janeiro University’s repository (in Portuguese). Retrieved in May 1, 2021, from http://www.labbas.eng.uerj.br/pgeciv/nova/files/teses/16.pdf
http://www.labbas.eng.uerj.br/pgeciv/nov...
, the event tree methodology has been very useful for assessing risks in dams as it uses data from field inspection sheets, which are standardized routines and with reasonable frequency. In this way, it can be updated frequently, generating increasingly assertive and directive parameters for the structure being considered, in addition to being orientative for the management issues of dam safety.

The first step is to have a well-defined inspection sheet, with all areas of the structure mapped, as well as an applied list of possible anomalies. Fernandes (2017)Fernandes, R.B. (2017). Manual for preparing emergency action plans for mining dams (PAE) (1st ed.). Belo Horizonte: Instituto Minere (in Portuguese). presents a model form, emphasizing that it must be customized according to the particularity of each dam. The perception of possible anomalies associated with the functionality of a structure, and its respective performance, trigger a process of verification of probabilities, determining which decisions or recommendations should be prioritized.

In this item, a new risk analysis methodology is being proposed, based on the event tree model, but totally directed to use in tailings or water dams. Such methodology will be called “proposed method” and basically consists of calculating a probabilistic risk (RP), based on events and probabilities resulting from the progression of an anomaly identified in inspection sheets (called inspection probability - PI), following a logical and numerical order, depending on the magnitude (M), the danger level (NP), the anomaly probability (PA) and the selected failure mode. In addition, it considers a probabilistic anomaly description (DPR), which allows a better visualization of the associated risks.

For each region of the dam the probable anomalies are listed, which have subsequent coding, as the following example, still being defined, for each one, the magnitude and the level of danger: B – Dam, B.1 – Upstream Slope 1, B.1.1 – Erosions.

Magnitude (M) defines a dimension and the evolution of this anomaly, compared to previous inspections and, based on what was verified in the field during use as the basis of the risk analysis. For magnitude, there are the following categories:

  • I. Insignificant anomaly with no apparent evolution;

  • P. Small anomaly with evolution over time;

  • M. Medium anomaly with no apparent evolution;

  • G. Large anomaly with evident evolution, or large-scale anomaly.

The danger level (NP) presents a numerical classification for the anomaly identified, based on the degree of impairment of the stability and safety of the structure, being:

  • Normal, anomaly does not compromise dam safety;

  • Attention, anomaly does not immediately compromise the safety of the dam, but if it progresses, it can compromise it, and must be controlled, monitored or repaired;

  • Alert, anomaly compromises dam safety, and immediate measures must be taken to eliminate it;

  • Emergency, anomaly represents a high probability of dam failure.

In this way, at the end of the evaluation, it will be possible to establish a sequence of anomalies, by region, which will present a magnitude and a level of danger, as shown in Table 1. Anomaly probability (PA) ranges from 1 to 100%, presented in decimals. Therefore, there will be variations between 0 and 1.0, depending on the magnitude composition and the level of danger presented in the inspection form. It should be noted that the values 0% and 100% can be disregarded because they are extremes and, to guarantee a safety margin in relation to subjectivity when filling out the inspection form. The ranges must be defined based on the stability analyzes and, according to the potential failures of the structure verified in the history of regular inspections.

Table 1
Example of list of anomalies, magnitudes and levels of danger by the proposed method.

As an example, anomaly probability (PA) can be defined with a combination of:

  • Insignificant magnitude (I) with:

    • Normal NP (0) - Probability of 0.10;

    • Attention NP (1) - 0.15;

    • Alert NP (2) -; and

    • Emergency NP (4) - 0.25.

  • Small magnitude (P) with:

    • Normal NP (0) - Probability of 0.30;

    • Attention NP (1) - 0.40;

    • Alert NP (2) - 0.50; and

    • Emergency NP (4) - 0.55.

  • Average magnitude (M) with:

    • Normal NP (0) - Probability of 0.60;

    • Attention NP (1) - 0.65;

    • Alert NP (2) - 0.70; and

    • Emergency NP (4) - 0.75.

  • Large magnitude (G) with:

    • Normal NP (0) - Probability of 0.80;

    • Attention NP (1) - 0.85;

    • Alert NP (2) - 0.90; and

    • Emergency NP (4) - 0.95.

Failure mode (MF), also defined as consequence, consists of the last event, subsequent to the progression of the anomaly, which leads the structure to a rupture condition. There are several failure modes that can be considered, such as overtopping, piping, structural problems like instability, liquefaction, deformation, and management issues.

The events are the unfolding of the observations made on the Inspection Sheet, as well as analysis of the photographic report, with the possible and probable sequencing of the anomaly's progression, up to the failure mode considered. The events are successive, that is, event 5, is an offshoot of event 4 which, consequently, is an offshoot of event 3, and so on. The events are defined in sufficient numbers to fully describe the progression of the anomaly to the failure mode. Generally, events are defined as the “nodes” of the tree, and unfold into two branches, or more.

The inspection probability (PI) is the product of anomaly probability (PA) by the probabilistic percentage attributed to each event. In a general equation: PIn = (PIn-1) x (PEevent”n”), where “n” corresponds to the number of events, and for the first event, PI1=PAxPEevent1.

The numerical relationship of the probability of occurrence of each event analyzed is based on the photographic report and the observations of the inspection form. The events are complementary, that is, the sum of the branches of each “node” must be 100%. Also, for each event, the sum of the probabilities of all branches is 100%.

If risk analysis is used as a prerequisite for failure analysis, such as FMEA type, according to USACE (2014)U.S. Army Corps of Engineers – USACE. (2014). Engineering and design: safety of dams: policy and procedures, engineering (Manual, No. 1110-2-1156). Washington: USACE., it is essential that the final event of the tree consider processes of local and global instability.

Probabilistic risk (RP) is the product of anomaly probability (PA) by the inspection probability (PI) for each event. That is, it is the probability of each branch of the tree. It is interesting to organize them in priority order, that is, from the highest to the lowest probability.

Table 2 shows an example of anomaly probability (PA) and failure mode (MF) for certain anomalies. Figure 5 shows an example of an event tree for anomaly B.1.1, of erosions in the upstream slope.

Table 2
Probability example (PA) and failure modes (MF) for certain anomalies, by the proposed method.
Figure 5
Anomaly Event Tree B.1.

After calculating the Probabilistic Risk (RP) for each anomaly, the Probabilistic Anomaly Description (DPR) is performed, organizing the RP of each tree in sequencing, from largest to smallest. The description must be complete, starting with the anomaly and followed by the location where it was found, with the insertion of all events. It is generally easier to describe each block separately, starting with the branch with the fewest events and progressing to the one with the most events, like in Figure 5.

In the end, all the RP´s of the trees are compiled and priority sequencing is taken to treat anomalies, based on the probability of failure. This product of the proposed method is risk analysis, that is, the definition of the probabilistic risks of the progressive sequencing of each anomaly, for certain failure modes.

The RP´s can be grouped in acceptable and unacceptable zones, as proposed by Brazendale & Bell (1994)Brazendale, J., & Bell, R. (1994). Safety-related control and protection systems: standards update. IEE Computing and Control Engineering J., 5(1), 6-12. http://dx.doi.org/10.1049/cce:19940101.
http://dx.doi.org/10.1049/cce:19940101...
and previously presented. In this case, the extremes are defined as an acceptable risk range, and an unacceptable risk range. In between these two bands, there is the tolerable risk zone. For the proposed method, it is defined that probabilistic risks (RP) equal to or greater than 20% represent a great potential for failure and should be considered in a more detailed analysis, as unacceptable risks. Tolerable Risks are those in the range between 10 ≤ RP <20 and acceptable risks are those with a value less than 10%.

3.1 Failure analysis

The analysis of failure modes of the FMEA type allows anomalies to be assessed from the perspective of the function in the structure where they occur, considering the occurrence, detection and severity. It is an in-depth analysis that allows the identification of the individual failure modes of each anomaly, exploring the consequences of the causes and effects.

The occurrence index (O) represents the probability of the occurrence of the anomaly that will result in a failure, that is, the frequency with which these failures can occur per year, ranging from 1 to 10 (Table 3).

Table 3
Occurrence index (O) - Proposed method.

The detection index (D) considers the possibility of detecting new failure modes before they occur and varies on a scale from 1 to 10 (Table 4). In the case of dams, it is an important index since some failure-triggering mechanisms do not demonstrate clear signs or in time if rupture is avoided. That is why inspections must be carried out by an extremely qualified team, considering the highest evaluation criteria, since the situation involves a high risk and a high potential for damage.

Table 4
Detection index (D) - Proposed method.

The severity index (S) considers the impacts and damage resulting from the failure and also varies on a scale from 1 to 10, with the greater the severity, the greater the associated damage (Table 5). In the context of the word damage, social, environmental and economic impacts are considered, according to Fernandes (2020)Fernandes, R.B. (2020). Methodology for risk management in dams from the Event Tree and FMEA Analysis [Doctoral thesis, Rio de Janeiro University State]. Rio de Janeiro University’s repository (in Portuguese). Retrieved in May 1, 2021, from http://www.labbas.eng.uerj.br/pgeciv/nova/files/teses/16.pdf
http://www.labbas.eng.uerj.br/pgeciv/nov...
. According to the FMEA methodology (USACE, 2014U.S. Army Corps of Engineers – USACE. (2014). Engineering and design: safety of dams: policy and procedures, engineering (Manual, No. 1110-2-1156). Washington: USACE.), the multiplier interaction between the three indexes occurs with the calculation of the risk potential number (RPN), represented in a two-dimensional risk matrix, characteristic of the model.

Table 5
Severity Index (S) – Proposed method.

The uncertainties of the geotechnical behavior of soils, mainly under the action of static and dynamic loads, signal the importance of a probabilistic analysis for an adequate assessment of the stability of the structures. In this way, risk analysis consists of verifying different components of a system, which interact, and the resulting scenarios can be more or less critical. It also allows for the definition and recognition of risks, resulting in a more effective and integrated action plan.

For each system and subsystem that has been assigned a code for the inspection sheet, a function, failure, final effect, cause, control and control type must be established, in addition to the RPN calculation.

For the function item, it is desirable to contain transitive verbs that, having incomplete meaning, need a verbal complement to complete their meaning, that is, they need a direct or indirect object. Some examples of transitive verb linked to the object are contained, retain, provide, drain, promote, among others. In this item, something that is related to the anomaly considered for the item is added.

For the failure item, the failure mode is associated in case the system is prevented from exercising the defined function and, generally, it is the direct negation of the function item. If the failure analysis is being used as a next step to the risk analysis by the proposed method, the failure modes defined for the anomaly probability (PA), associated with the anomaly considered for the item, can also be used.

The final effect is the consequence of the failure and is the item that must be evaluated together with the severity index (S). If the failure analysis is being used as a next step to a risk analysis, the final effect refers to the progressive unfolding of the anomaly, that is, the last event considered, which generally indicates a global or local instability. Each final effect must contain a numerical indicator of severity (S).

The cause must be assessed on two scales. On the micro scale, it is usually associated with an anomaly and, in this case, the anomalies listed in the risk analyzes can be used. The progression of anomalies leads to a macro-scale assessment, and generally refers to issues related to the inadequacy of designs, inefficient construction techniques or problems with maintenance of the structure. Each cause must contain a numeric indication of occurrence (O). If the failure analysis is being used as a step following a risk analysis, the probability of occurrence is interpreted according to the probabilistic risk (RP) of the risk analysis.

In the FMEA matrix of the proposed method, the mild failures are those contemplated in the green portions of the matrix, with the Intermediates in the region in blue and the severe in the orange and red portions. The definition of colors by the proposed method considered that:

  • Collective damages are related to severe failures, therefore, severity index ≥ 6;

  • Individual damages are related to intermediate failures, therefore, 6> severity index ≥ 3;

  • Isolated or undamaged damages are related to mild failures, therefore, severity index <3;

  • Occurrence index ≥ 6 (high to very likely) combined with severity index ≥ 9 are zones of severe failures;

  • Occurrence index ≥ 7 (high to very likely) combined with severity index ≥ 9 are zones of severe failures. The same is true for occurrence index = 6 (high) and severity index 10;

  • The definition considered the great impact of the damages and the high probability of the events occurring in the year;

  • The demarcation of the other severe, intermediate and mild zones follows the proportion of the severity of each failure when combined with the occurrence of the events.

The action plan established for the control of each cause can be broken down into as many activities as necessary in order to control what is causing a particular failure. In this case, they are listed as control and, generally, measures are established for the adequacy and revision of the project, visual inspections, verification of instrumentation levels, stability analyzes or more particular actions. Control type is generally defined as prevention and detection. Prevention refers to measures that require planning to occur and the involvement of a multidisciplinary team for this action, such as, for example, project adjustments. Detection, on the other hand, refers to activities that must be carried out directly in the structure and that do not require major interventions to be carried out, being, in general, the routine activities already established such as inspection and monitoring. It is desirable that for each cause, at least two controls are established and, consequently, two types of control, one for prevention and the other for detection. Each prevention x detection pair must contain a numeric detection index (D).

RPN is the multiplication of severity (S), occurrence (O), detection (D) indexes. Table 6 presents an example of FMEA analysis for dam, upstream slope and crest, with the respective indexes and RPN calculation, applying the index values established in the proposed method. The FMEA matrix considers the values of severity and occurrence and, for the case of Table 6, Figure 6 has the graphical representation. For this case, item “B = Dam” requires greater attention, as it is in a more critical area. However, in terms of RPN, the calculated values B.2.a are higher, but due to the low occurrence, they are in a less critical zone, which still requires attention.

Table 6
Example of FMEA analysis for dam, slope and crest.
Figure 6
FMEA matrix for the example of dams.

According to the RPN ranges obtained in each analysis, a segmentation of the RPN should be proposed, at least, in three ranges, such as acceptable risk, tolerable risk and intolerable risk. The higher the RPN value, the lower the tolerance for a given event, that is, the greater the assertiveness in the response must be, and immediate measures must be implemented.

In defining the action plan, a priority order of actions based on the RPN could be developed, for example, as follows:

  • Priority 0 – B and B.2.a;

  • Priority 1 - B.2.b;

  • Priority 2 - B.1.a and B.1.b.

4. Application of the proposed method

In this item, three dams will be evaluated in the light of the methodology described, defined as the proposed method. The identification of the dams, as well as the photographic report of the structures has been deleted to protect the data companies that own the structures, given that there is no authorization for detailed disclosure of the dams. This omission does not compromise the study, since what is discussed and presented is the application of the method based on the inspection form and identification of anomalies.

Dam A is a soil embankment structure with a homogeneous section, 6 m high and 200 m length at the top, built to accumulate water from a food industry, with construction works completed in 1995. Dam B is a concrete structure, with a maximum height of 10 m, 130 m long and a spillway (millenary TR), with water accumulation for the purpose of generating electricity, built in 1930. Dam C is a soil embankment dam, with a homogeneous section, 14 m high and 800 m long, built to accumulate residues from a gold mining. The spillway was expanded in 2009 to and the dam was built in 1982.

The inspection of the three structures was carried out between the period of November 2019. The application of the method will be done in stages so that its effectiveness is individually assessed, in the topics described below:

A. Dam inspection form that allows identifying anomalies as well as determining the magnitude (M) and danger level (NP). An example was provided in Fernandes (2017)Fernandes, R.B. (2017). Manual for preparing emergency action plans for mining dams (PAE) (1st ed.). Belo Horizonte: Instituto Minere (in Portuguese).;

B. For anomalies identified as first time (PV), disappeared (DS), decreased (DI), remained constant (PC) and increased (AU), values of M and NP must be assigned. Categories of M and NP was described in section 3 of this report with a summary model on Table 1);

C. Define the anomaly probability (PA) according to intervals defined in section 3, after example on Table 1;

D. Define the failure mode (MF);

E. List the anomalies in sequence of less magnitude (M) and danger level (NP) for the highest indices, inserting the anomaly probability (PA) and failure mode (According example in Table 2);

F. Design of the event trees considering the probability of the anomaly (PA) to compromise the structure stability (an example was provided in Figure 5);

G. Probabilistic risk management (RP) of anomalies, like described in the end of general section 3. Definition of acceptable, tolerable and unacceptable risk zones;

H. Failure analysis for the unacceptable risks define index like described on Table 3 to 5 on section 3.1. Establish the RPN as the example on Table 6;

I. Definition of severe, intermediate and mild failures on a matrix like the one shown on Figure 6. Define the action plan.

In the case of Dam B, for example, 14 anomalies were identified during the field inspection, as shown in Table 7. There is no critical anomaly, that is, a hazard level equal to or greater than 2 (Alert or emergency, as they require an intervention immediate repair). Table 8 shows the ordering of anomalies and the definition of anomaly probability and failure mode.

Table 7
Dam B anomalies.
Table 8
Dam B anomaly probability and failure mode.

For the anomalies of Dam B, 150 probabilistic risk Scenarios were identified according to the event tree analysis for proposed method, like the example at Figure 7 and Table 9. For all these scenarios, 6 of which were considered unacceptable, as shown in Table 10. These were directed to the failure analysis of the FMEA type which identified 4 intermediate and 2 mild failures as demonstrated in Table 11 and Figure 8.

Figure 7
Tree events for anomaly A.3 for Dam B.
Table 9
Probabilistic anomaly description for anomaly A.3 for Dam B.
Table 10
Dam B ordering of unacceptable risks.
Table 11
Failure analysis for the unacceptable risk at Dam B.
Figure 8
Matrix of failure analysis for the unacceptable risk at Dam B.

A summary of all risk and failure analysis with the application of the proposed method is presented in the Table 12, considering the three dams.

Table 12
Application summary of proposed method.
  • For the case of dam A:

    • Inspection: 50 anomalies, 35 of which are critical but without understanding what the probability of their progression will generate a failure mode.

    • Risk analysis: breakdown of 50 anomalies into 749 failure scenarios due to local or global instability, for a given failure mode. Of this total, 45 are prioritized as unacceptable risks, that is, they must be immediately mitigated.

    • Failure analysis: out of the 45 unacceptable risks, none are subject to severe failures, and the 33 intermediaries must be prioritized over the 12 mild ones.

  • For the case of dam B:

    • Inspection: 14 anomalies, with no criticism but without understanding what the probability of its progression will generate a failure mode.

    • Risk analysis: breakdown of 14 anomalies into 150 failure scenarios due to local or global instability, for a given failure mode. Of this total, 6 are prioritized as unacceptable risks, that is, they must be immediately mitigated.

    • Failure analysis: out of the 6 unacceptable risks, none is a condition for severe failures, and the 4 intermediaries must be prioritized over the 2 mild ones.

  • For the case of dam C:

    • Inspection: 46 anomalies, 4 of which are critical but without understanding what the probability of its progression will generate a failure mode.

    • Risk analysis: breakdown of 46 anomalies into 634 failure scenarios due to local or global instability, for a given failure mode. Of this total, 17 are prioritized as unacceptable risks, that is, they must be immediately mitigated.

    • Failure analysis: out of the 17 unacceptable risks, none is a condition for severe failures, and the 14 intermediaries must be prioritized over the 3 mild ones.

5. Conclusions

The frequency and severity of the failures are increasing globally, the majority of which would be preventable if observed due diligence on the part of dam owners and operators. Technical knowledge exists to allow dams to be built and operated at low risk, but the frequency of ruptures leads to lapses in the consistent application of expertise throughout the life of an installation and due to a lack of attention to detail. In Brazil, professional practice and regulatory guidance allow unbridled confidence in the observational method, a continuous, managed and integrated design process, construction control and monitoring of structures. In many of the failures, the reports indicate a series of constructive breaches in the filter and drain systems, concrete galleries, concrete bypass channels, in addition to critical operational issues over the years of operation. The best practices, the best knowledge and the best available techniques need to be main guidelines, assumed about planning, design, construction, operation, monitoring and closing plan of dams. As these guidelines become clear, and are applied, the industry will no longer depend on assumptions about observational methods, which consider the expertise and particular point of view of engineers and consultants to make important decisions that affect risk.

All structures present some degree of risk, even after control and, therefore, it is necessary to develop action plans, or contingency plans, to minimize such risks. In these plans, the necessary actions to minimize the risk and mitigate the consequences should the event be addressed, as well as address issues of responsibility and responses to emergencies. The proposed method reduces the subjectivity of filling in dam inspection forms, expanding the understanding of which anomalies are most significant from the point of view of triggering a failure mode. In this way, it allows prioritizing an action plan in a more assertive way, reducing time of operation and maintenance costs, as it increases the effectiveness in controlling anomalies in dam safety management.

The application of the proposed method for the case study of the three dams allows us to conclude that:

  • The definition of an action plan based only on the observations of the inspection form has a subjective character, since it will depend on the professional's expertise in surveying the structure's commitment in the face of a certain anomaly;

  • As levels of magnitude (M) and danger level (NP) associated with each anomaly are defined, a pattern is created that reduces subjectivity in the classification of the anomaly and allows the identification of critical anomalies (NP ≥ 2) that require more immediate intervention. It is noteworthy that until this phase of the method it is not yet defined which anomaly represents the greatest risk, as it is a partial analysis;

  • When creating the probabilistic matrix of the combination of magnitude (M) with the danger level (NP), it is possible to find the anomaly probability (PA), an index for entry into the proposed method event trees. The higher the PA, the closer the anomaly is to the unacceptable risk scenario;

  • Failure Mode (MF) identification allows to relate the anomaly to the final process of instability in the progression of the anomaly, both from a local and global point of view. Thus, when sequencing the event tree for each anomaly, the objective will always be to predict all scenarios up to the defined failure mode;

  • The events in the event tree are broken down based on the observations of the inspection forms and the photographic report and, accordingly, it is important to detail each anomaly marked on the form. The higher the level of details, the more accurate the establishment of the percentages of each event and, consequently, the more assertive the calculation of the probabilistic risk (RP);

  • The definition of probabilistic risks at unacceptable, tolerable and acceptable intervals, allows the action plan to be directed towards a more specific intervention, in order to mitigate anomalies that have a development more directed to the specified failure mode;

  • Failure analysis of unacceptable risks, prioritizes anomalies that lead to severe failures within the action plan, followed by intermediate and mild failures. In this way, a robust, directive plan can be used to guide all dam safety management;

  • The severe failure action plan should include more immediate mitigating actions in relation to anomalies that correspond to intermediate failures and, in the sequencing, to mild failures;

  • The proposed method is a failure and risk analysis tool that must be used in conjunction with other techniques to guarantee the local and global stability of dams;

  • For the complete version of this methodology, access the PhD Thesis available at Fernandes (2020)Fernandes, R.B. (2020). Methodology for risk management in dams from the Event Tree and FMEA Analysis [Doctoral thesis, Rio de Janeiro University State]. Rio de Janeiro University’s repository (in Portuguese). Retrieved in May 1, 2021, from http://www.labbas.eng.uerj.br/pgeciv/nova/files/teses/16.pdf
    http://www.labbas.eng.uerj.br/pgeciv/nov...
    .

List of symbols

β Reliability index

AU Increased

coV Coefficient of variation

CMP Maximum design flood

D Detection Index

DI Decreased

DPR Probabilistic Anomaly Description

DS Disappeared

FMEA Failure Mode Event Analysis

FS Safety Factor

G Large Magnitude

H Horizontal

I Insignificant Magnitude

M Safety margin

M Magnitude

M Average Magnitude

MF Failure Mode

N Number of events

NP Danger Level

O Occurrence Index

P Small Magnitude

PA Anomaly Probability

PC Remained Constant

PE Anomaly Probability per Event

Pf Probability of failure

PI Inspection Probability

PV First Time

RP Probabilistic Risk

RPN Risk Potential Number

S Severity Index

SD Standard deviation

TR Return Period

V Vertical

References

  • ABNT NBR 31000. (2018). Risk Management: Guidelines ABNT - Associação Brasileira de Normas Técnicas, Rio de Janeiro, RJ (in Portuguese).
  • Brazendale, J., & Bell, R. (1994). Safety-related control and protection systems: standards update. IEE Computing and Control Engineering J., 5(1), 6-12. http://dx.doi.org/10.1049/cce:19940101
    » http://dx.doi.org/10.1049/cce:19940101
  • Costa, J.E. (1985). Floods from dam failures Reston: U.S. Geological Survey.
  • Fernandes, R.B. (2017). Manual for preparing emergency action plans for mining dams (PAE) (1st ed.). Belo Horizonte: Instituto Minere (in Portuguese).
  • Fernandes, R.B. (2020). Methodology for risk management in dams from the Event Tree and FMEA Analysis [Doctoral thesis, Rio de Janeiro University State]. Rio de Janeiro University’s repository (in Portuguese). Retrieved in May 1, 2021, from http://www.labbas.eng.uerj.br/pgeciv/nova/files/teses/16.pdf
    » http://www.labbas.eng.uerj.br/pgeciv/nova/files/teses/16.pdf
  • International Commission on Large Dams – ICOLD. (2001). Tailings dams risk of dangerous occurrences: lessons learnt from practical experiences (Bulletin, No. 121). Paris: ICOLD.
  • Kloman, H.F. (1992). Rethinking risk management. The Geneva Papers on Risk and Insurance - Issues and Practice , 17(3), 299-313. http://dx.doi.org/10.1057/gpp.1992.19
    » http://dx.doi.org/10.1057/gpp.1992.19
  • Lacasse, S., Nadim, F., Liu, Z.Q., Eidsvig, U.K., Le, T.M.H., & Lin, C.G. (2019). Risk assessment and dams: recent developments and applications. In Proceedings of the XVII European Conference on Soils Mechanics and Geotechnical Engineering (Vol. 1, pp. 1-26), Reykjavik Iceland. https://doi.org/10.32075/17ECSMGE-2019-1110
    » https://doi.org/10.32075/17ECSMGE-2019-1110
  • Londe, P. (1995). Safety concepts applied to rock masses. In C. Fairhurst (Ed.), Analysis and design methods: principles, practice and projects (pp. 749-769). Minnesota: University of Minnesota. https://doi.org/10.1016/B978-0-08-040615-2.50035-6
    » https://doi.org/10.1016/B978-0-08-040615-2.50035-6
  • U.S. Army Corps of Engineers – USACE. (2014). Engineering and design: safety of dams: policy and procedures, engineering (Manual, No. 1110-2-1156). Washington: USACE.
  • Whitman, R.V. (1984). Evaluating calculated risk in geotechnical engineering. Journal of Geotechnical Engineering, 110(2), 143-189. http://dx.doi.org/10.1061/(ASCE)0733-9410(1984)110:2(143)
    » http://dx.doi.org/10.1061/(ASCE)0733-9410(1984)110:2(143)

Publication Dates

  • Publication in this collection
    29 Aug 2022
  • Date of issue
    2022

History

  • Received
    04 May 2021
  • Accepted
    22 July 2022
Associação Brasileira de Mecânica dos Solos Av. Queiroz Filho, 1700 - Torre A, Sala 106, Cep: 05319-000, São Paulo - SP - Brasil, Tel: (11) 3833-0023 - São Paulo - SP - Brazil
E-mail: secretariat@soilsandrocks.com