Risk analysis and reliability of the GERDA Experiment extraction and ventilation plant at Gran Sasso mountain underground laboratory of Italian National Institute for Nuclear Physics

The aim of this study is the risk analysis evaluation about argon release from the GERDA experiment in the Gran Sasso underground National Laboratories (LNGS) of the Italian National Institute for Nuclear Physics (INFN). The GERDA apparatus, located in Hall A of the LNGS, is a facility with germanium detectors located in a wide tank filled with about 70 m3 of cold liquefied argon. This cryo-tank sits in another water-filled tank (700 m3) at atmospheric pressure. In such cryogenic processes, the main cause of an accidental scenario is lacking insulation of the cryo-tank. A preliminary HazOp analysis has been carried out on the whole system. The risk assessment identified two possible top-events: explosion due to a Rapid Phase Transition RPT and argon runaway evaporation. Risk analysis highlighted a higher probability of occurrence of the latter top event. To avoid emission in Hall A, the HazOp, Fault Tree and Event tree analyses of the cryogenic gas extraction and ventilation plant have been made. The failures related to the ventilation system are the main cause responsible for the occurrence. To improve the system reliability some corrective actions were proposed: the use of UPS and the upgrade of damper opening devices. Furthermore, the Human Reliability Analysis identified some operating and management improvements: action procedure optimization, alert warnings and staff training. The proposed model integrates the existing analysis techniques by applying the results to an atypical work environment and there are useful suggestions for improving the system reliability.


Introduction
According to many authors, to improve safety one has to know where the risks are (Pasman et al., 2009).This is certainly true when it is necessary to design the safety of complex systems, where the predictive analysis of failure modes requires identification of the hazardous conditions, to quantify their probability of occurrence and to define representative accident scenarios.The representativeness of these scenarios is subject to the knowledge of production processes and system parts and the quantitative risk analysis requires that all failure events be considered (Zhao et al., 2016).
The proposed methodologies have been separated into three different phases: identification, evaluation and hierarchization (Tixier et al., 2002).
From the accidental risk analysis, during the identification phase, a pressure rise in the cryostat over the design value (a) and exceeding of the containment and insulation conditions (b) of the cryogenic liquid was identified and it was possible to identify a dominating critical scenario due to a mixing of the shielding water of the Water Tank with the cryogenic liquid (LAr), which leads to an explosive effect due to Rapid Phase Transition (RTP).This mode can be considered as a critical sub system of containment loss (mixture of LAr and Water) A second scenario is a relevant rise of cryogenic liquid evaporation higher than the functional values.There is a risk of hypo-oxygenation and hypothermia, depending on the modalities with which the release itself is managed.This scenario has a draining flow limit as the critical sub system.
Based on the Safety Management System procedures currently in use at the Laboratories, the risk analysis for all the apparatus is a step-by-step complex procedure that must be completed and agreed upon before the installation of the experiment.
According to the known whole documentation, the proposed analysis has been focused on the critical factors (evaluation phase) related both to the cryogenic liquid evaporation and to the functionality of the extraction/ventilation system, in order to identify, as hierarchization phase, possible failure causes of the system and evaluate the device reliability of the system.
The "core" of the research consists of the application of industrial risk assessment techniques and methodology in a high technology context and in a "prototype scale", as each experiment is really unique in the world.Moreover, the attention to the safety issues has to take into account the boundary condition of the underground labs, the confined area and the proximity to a public motorway tunnel.

Gran Sasso National Laboratory (INFN)
INFN Gran Sasso National Laboratory (LNGS) is the largest underground laboratory in the world devoted to neutrino and astro-particle physics and it offers the most advanced underground infrastructures in terms of dimensions, complexity and completeness.
LNGS is funded by the National Institute for Nuclear Physics (INFN), the Italian Institution which coordinates and supports research in the field of elementary particles, nuclear and sub nuclear physics.
The laboratory is on one side of the 10-km long highway tunnel which crosses the Gran Sasso massif.It consists of three huge experimental halls (Hall A, Hall B and Hall C, each 100-m long, 20-m wide and 18-m high) and the connection among the halls is achieved by other smaller galleries: car tunnel, truck tunnel, connecting tunnels.
Halls are equipped with all technical and safety equipment and plants necessary for the experimental activities and to ensure proper working conditions for underground users.
The 1400 m rock layer above the Laboratory represents a natural coverage that provides a cosmic ray flux reduction by one million times; moreover, the flux of neutrons in the underground halls is about a thousand times less than on the surface, due to the very small amount of uranium and thorium of the calcareous rock of the mountain.
The permeability of cosmic radiation provided by the rock coverage together with the huge dimensions and the impressive basic infrastructure, make the Laboratory unmatched in the detec-tion of weak or rare signals, which are relevant for astro-particle, sub nuclear and nuclear physics.
The research areas are: -study of rare nuclear phenomena; -study of the most penetrating components of cosmic rays; -neutrino physics; -dark matter.
LNGS is subject to the European Directive Seveso III (2012/18/UE): the underground labs are classified as major accident hazard plants due to the presence of experiments using and storing remarkable amounts of substances classified as dangerous for the environment.According to Seveso III, LNGS have adopted a Safety Management System (SGS), and before starting any activity or new project/ experiment, LNGS and Experimental Collaborations must realize a Safety Risk Analysis in order to evaluate the probabil-ity of occurrence of possible events and to guarantee the highest safety standards in a complex system such as the one in which LNGS are involved.

The GERDA experiment cryogenic gas extraction
The GERDA experiment has been proposed in 2004 as a new 76 Ge doublebeta decay experiment at LNGS.The GERDA installation is a facility with germanium detectors made out of isotopically enriched material.The detectors are operated inside a liquid argon shield: GERDA experiment has been designed for the clean handling and the stable long-term operation of the Germanium Detector Array in a shield of liquefied gas, copper and water that suppresses the environmental radioactivity by a factor of ~1/10 8 .
The Ge detectors are lowered from the lock in the clean room into the center of a double-walled vacuum isolated cryostat (Ø 4.2m, H=9m), which is filled with 6.5*10 4 liters of liquid argon (T = -175 ºC).The cryostat (see Figure 1) is manufactured from 30 tons of selected stainless steel of low radioactivity; its vertical walls are covered with 16 tons of ultrapure copper.The cryostat adheres to the principle of "leak before break", has no penetration below the fill level and is certified for 1.5 bar overpressure being actually operated at 1.2 bar.
The shield is completed by a tank (Ø=10m, H=9m) filled with ultrapure water; the water level rests at 8.4 m.It contains 580 m 3 of purified water.It suppresses not only the external gamma radiation but also moderates and absorbs neutrons very efficiently.The water serves also as a radiator for a Cherenkov detector which allows to identify and veto the few muons, ~60 per hour, which penetrate through the Gran Sasso massif into the GERDA setup.
The water tank can be completely drained within less than 2 hours.
The water tank has been built around the cryostat from top to bottom: roof and topmost cylindrical ring have been built first and have then been lifted by the hall crane for the assembly of the next ring underneath.
All vertical surfaces within the water tank including those of the cryostat have been covered with a reflective and wave-length shifting foil for improved detection of the Cherenkov light.The purple layer on the cryostat's wall consists of 6 mm thick extruded poly-styrene foam serving as a thermal impedance which limits the evaporation in case of a leak in the inner container.A similar barrier is mounted on the inner wall.
The final section of the own GERDA extraction and ventilation system is connected to the main general exhaust of the Underground Laboratories, that refers to two pumping air stations: the Assergi (AQ) station and the Casale San Nicola (TE) station.The Underground Labora-tories ventilation system ensures the ejection of smoke and gases out of the laboratories up to a flow rate of ≈6*10 4 m 3 /h.The LNGS ejection system of cryogenic gases for GERDA is designed to ensure a maximum flow rate of 10 4 m 3 /h: the ejection point is close to the clean room at 7.30 meters high, into which all the cryogenic gases are directed and where the heat exchanger is installed.The underground laboratories ventilation system is managed by a "slow-control software".
Two motorized dampers are installed close to the connection between the ejection system and the underground laboratories ventilation system; furthermore, another damper is installed on the new ejection system to provide air ejection from the heat exchanger release point.
The system is equipped with AISI 304 tubes with thickness in compliance with regulations, and is structured with a support and anchoring system, manual air dampers and extraction grills, motorized air dampers, control and detection devices.
The GERDA extraction system has to guarantee a constant ejection with a 2.5*10 3 m 3 /h flow rate, up to 10 3 m 3 /h in case of "sudden" cryogenic gas release.

Hazard identification
The main cause of an accidental scenario is the lack of insulation of the cryo-tank.
A preliminary HazOp analysis has been carried out on the whole system and in particular, the risk assessment identified two top-events more critical: -TOP EVENT 1: explosion due to a Rapid Phase Transition -RPT; -TOP EVENT 2: argon runaway evaporation.
The RPT explosion is due to the contact between the liquid argon and the water, with the production of a great amount of gaseous argon and shock wave.In this case, the lack of insulation of the cryostat could be due to an overpressure or to a crack in the vessel (i.e.human error in the welding phase, wet corrosion, etc.).
The use of a cryostat "intrinsically safe" allowed to reduce the estimated probability of occurrence of RPT to 10 -8 event/year (Guarascio et al., 2013).The GERDA cryostat, in fact, is realized with suitable materials (leak before break), with double wall and double containment.
The argon runaway evaporation beyond functional values leads to asphyxiation and hypothermia risks.
The estimated thermal power of the Argon is 5 kW/m 2 with initial evaporation rate of 10 4 m 3 /h.A double polycarbonate layer, the LEXAN, is inserted in the inner and outer walls of the cryostat, in order to reduce the thermal transmission coefficient and to achieve a greater insulation of the cryostat.
The hazard analysis conducted before the approval and the installation of the GERDA experiment shows the following results: Table 1 Hazard analysis results.
According to Italian Legislative Decree 334/99, all the events with a probability of occurrence greater than 10 -6 event/ year have been considered (Guarascio et al., 2013).
Among the different events identified in the Risk Assessment, the Top Event 2 has certainly the higher probability of occurrence.In fact, during the whole design process and analysis, several structural measures have been put into practice: -the original single copper wall cryostat has been replaced by a double wall stainless steel one; -there is a mutual independence between the two stainless steel walls, guaranteeing a "double containment wall all over"; -the vacuum gap between the walls is under monitoring; -the cryostat has been "coated" with a lexan layer both on the inner and on the outer side, and with a mylar layer on the outer side, completely "wrapping" the whole cryostat; -a thermo-mechanical analysis shows that the leak from a wall as consequence of a single break on the other one is drastically reduced.
For the above cited reasons, the RPT Event resulted as being extremely unlikely (< 10-8ev/y) and attention has been focused on the TOP EVENT 2 (Marcoulaki et al., 2016) deepening the analysis and proposing technical improvements.

HAZOP Analysis of the extraction and ventilation plant
Hazards related to the extraction and ventilation plant have been identified: this plant has been divided into nodes; for each node process parameters and guide words have been applied (Groth et al., 2012).
The primary objective of the plant is to ensure, in an emergency situation, the gaseous argon extraction up to a flow rate of 10 4 m 3 /h.Considering the scope of the plant, the HazOp Analysis (Stefana et al., 2015) highlighted that a lack of flow in the piping could be due to the fan seizing or to an incomplete opening of the shut-off dampers, caused by human error or component breakdown.According to the human reliability analysis, the critical events and the corresponding causes leading to the unsuccessful operation of the plant are reported in Table 2.

Fault Tree Analysis-FTA
In this section is conducted the Fault Tree Analysis (Guarascio et al., 2007) of the Top Event identified by the previous HazOp: gaseous Argon release in Hall A.
For the event occurrence (Brighton et al., 1994), both the incomplete opening of the air dampers and the failure of the extraction system have to occur, and for this reason the FTA has two branches, connected by the logic gate "AND": failure of the extraction and ventilation plant and incomplete opening of the dampers.

Figure 2
The GERDA extraction plant layout.
The system is equipped with leak of liquid, temperature variation and oxygen deficiency detectors (Crowl et Louvar, 1990), connected to the corresponding optical and /or acoustic alarms (see Figure 2).In case of failure of the mechanized extraction system, the operator activates the P2 command and then the P3 one.The output of Fault tree analysis is summarized below according to Figure 3.The failure modes aim to evaluate the probability of occurrence according to the top events (Khan et al., 2001).

First branch: incomplete opening of the dampers
The main causes of the incomplete opening of the dampers represented by the logic gate "OR" are: -Damage of mechanical device.
-Failure of the damper activation system.
The damage of mechanical device is a "basic event", representing a final cause without sub-events.
The failure of the damper activation system could be due to two main causes, represented by the logic gate "OR": -Facilities failure.
-No signal for the damper opening.
The facilities failure is a "basic event" representing a final cause without sub-events.
The absence of signal for the damper opening could be due to two main causes, represented by the logic gate "OR": -Sensor warning failure.
-Failure on the activation of the opening command.
The sensor warning failure could be due to three main causes, represented by the logic gate "OR": -PLC-Programmable Logic Control Damage -Sensor Damage -Wiring Damage.These three events are "basic events" representing a final cause without sub-events.
The failure on the activation of the opening command could be due to two main causes, represented by the logic gate "OR": -Alarm system Damage.
-Human Error: wrong reaction to the alarm warning.
Both the events are "basic events" and the human error has been analyzed by the Human Reliability Analysis -HRA.

Second branch: failure of the extraction system
The main causes of the failure of the extraction system, represented by the logic gate "OR", are: -Ventilation system failure.
-Absence of electric power supply.
The failure of the ventilation system could be due to two main causes, represented by the logic gate "OR": -Fan jam.
-Electric Motor Failure.Both these events are "basic event" representing a final cause without sub-events.
The absence of electric power supply could be due to two main causes, represented by the logic gate "OR": -Error on the start command activation -Lack of energy power.The error on the start command activation could be due to two main causes, represented by the logic gate "OR": -Human Error.
-Alarm system Damage.Both the events are "basic events" and the human error has been analyzed in the Human Reliability Analysis -HRA.
The lack of energy power could be due to two main causes, represented by the logic gate "OR": -Wiring Damage.
-Energy shut-down.Both the events are "basic events" for the TOP EVENT and complete the structure of the FTA.
Basic Events: Fan jam, Wiring Damage, Energy shut-down, Electric Motor Failure, Human Error and Alarm system Damage have a greater influence on the probability of occurrence of the top event, therefore: -a little improvement of the electric line reliability involves a great improvement of the entire system reliability; -once the top event occurs, the lack of energy power has a probability of occurrence equal to 1.
The FTA results suggest the use of UPS, the improvement of the opening damper system maintenance and the optimization of the intervention and training procedures.

Event Tree Analysis (ETA)
The Event Tree proceeds with another two branch points, in order to verify the effectiveness of the control and regulation valve systems, and of the extraction system activated both automatically and by the operator.
Once the argon is released from the cryostat, the main cause of the argon emission in Hall A is the failure of the extraction system.
Safety measures related to the dampers opening consist of the automatic activation of the control and regulation valves and of the system for the extraction of cryogenic gases.Furthermore, the operator can directly adopt these safety measures, in redundancy with the system.
The experiment is equipped with liquid leak detectors, temperature and oxygen sensors.
In the ETA construction (Guarascio et al., 2013), two branch points have been considered in order to define the effectiveness of the first safety measure: the reliability of the safety measure itself is guaranteed by the correct operation, both of the detector and the alarm systems.In the branch related to the correct operation of the alarm system, another branch point concerning the correct reaction of the operator is present.

Branch point related to the correct operation of the detector system
The Event Tree proceeds with a branch point in order to verify the success or not of the systems power-up.In this case, the reaction of the operator it is not considered.
The ETA leads to the implementa-tion of safety measures to prevent the main event (Gaseous Argon release): -alerts for the operator by means of optical and acoustic alarms in the control room, done by sensors; -full opening of the dampers in the main general exhaust duct, done by the operator; -starting the extraction electric motor at operating speed in order to conduct the vaporized argon, done by the operator.
Branch point related to the correct operation of the alarm system 3.5 Human Reliability Analysis (HRA) HRA (Lombardi et al., 2014) has been carried out in order to deepen the evaluation of the actions performed by the operator in the control room.The task analysis performed ensures the following steps: recognition and identification of the alarm warning, call to the suitable operative action, identification of the extraction system activation button.
The HRA (Sun et al., 2012) has identified the main operational conditions influencing the operator work and predisposing him to mistakes, as the following ones: -excess of noise outside the operating room could impede the hearing of the alarm system; -wrong location of the optical alarm warning could impede its own identification; -absence of a feedback device for the information of the operator about the activation of the extraction system; -incongruity between procedures and operational activities; -incomplete layout of the instrumentation; -monotony of the surveillance.
The event tree related to the Human Reliability (Ying at al., 2010) regarding the reaction in case of alarm warning is (see Figure 4): -Small letters represent success of the operation; -Capital letters represent the human behavior; -Greek letters represent the operation of the protection systems.
Considering the failure probabilities and the operator's mistakes probabilities (Holmes et al., 1998), the overall human failure (G total ) is equal to 0.092 (see table 4).

Conclusion
The risk analysis procedure (Stamatelatos and Dezfuli, 2011) has been performed on the plants of the GERDA Experiment, located in the Hall A of the Underground Gran Sasso National Laboratory.According to different and complementary methodologies, the Risk Assessment has ensured a better analysis of the system and related hazards, by analyzing also the effect of human behavior into possible failure modes.
In particular, the event "Argon runaway evaporation" has been analyzed: its probability of occurrence is equal to 10-4 event/year and the Top Event "Gaseous Argon Release in Hall A " is the most critical one.
Based on the analysis (Pasman et al., 2009), three main corrective actions have been adopted in order to increase the reliability of the system.The actions comprehend both technical and managerial measures reported as follows: 1. Structural Intervention -addition of UPS (Unit Power Supply) in the extractor electricity supply system: this measure guarantees relevant reduction of the probability of occurrence of the Top Event [value of probability between 6.26x10 -2 and 1.23x10 -2 event/year].
2. Preventive maintenance approach for the control, regulation and activation device of the opening of the dampers: it is an electromechanical device subject to failure in its lifetime; periodic inspections and maintenance can lead to a reduction of the Top Events probability.
3. Periodical information and train-ing of the operating staff in charge of controls: the continuous training is crucial in order to reduce the probability of mistakes and to ensure an higher level of attention in the operations of the staff itself.
According to the evidences of available statistical reports, the effect of non compliant human behavior is the most relevant element about the causal analysis of the system failure (Duijm et al., 2006).
Based on the result of this back analysis a lot of control procedures are necessary in order to decrease failure occurrences (Sun et al., 2011).
The following step of this research will be the evaluation of the human error according to the integrated techniques of fault tree analysis and Ishikawa's model tested on the procedures usually employed.

Figure 1
Figure 1 labeled view of the GERDA installation.

Figure 3
Figure 3Results of Fault Tree Analysis (FTA) Figure 4 Human Reliability Analysis.Reaction in case of alarm warning

Table 2
Synthesis of the HazOp Analysis.

Table 3
Characteristics of the system and Conditions for the FTA development.