Revisiting the P-36 oil rig accident 15 years later : from management of incidental and accidental situations to the organizational factors

O acidente com a plataforma P-36 na Bacia de Campos, Rio de Janeiro, Brasil, se configura como um dos grandes desastres internacionais da indústria do petróleo. Nosso objetivo na reflexão aqui empreendida é: (a) verificar, com base em um caso específico, a questão do papel que exerce a dimensão humana na confiabilidade de sistemas de elevada complexidade – com foco na gestão de situações incidentais e acidentais –, capazes de acarretar acidentes de grande magnitude. E, ao nos debruçarmos sobre tal intento, somos remetidos à necessidade de (b) dar visibilidade à interveniência de alguns dos fatores organizacionais como elementos que podem contribuir para agravar o grau de risco da atividade em plataformas offshore, conduzindo a análise para além das chamadas causas imediatas. No que tange aos métodos de investigação, tomamos por base, principalmente, a pesquisa documental (com destaque para os relatórios da Petrobras, ANP/DPC e CREA-RJ) e as interlocuções que mantivemos com três profissionais que atuaram na P-36. Os resultados indicam que a gestão das situações incidentais e acidentais, na qual se circunscrevem as tomadas de decisão em contextos emergenciais, deve se valer da contribuição que os trabalhadores podem agregar no sentido de apontar e discutir com os gestores certas lacunas do processo, por intermédio do compartilhamento e da flexibilização de decisões e da análise coletiva das situações de risco. Indicam também que determinados fatores organizacionais contribuíram para a ocorrência do sinistro, corroborando estudos nacionais e internacionais acerca de grandes acidentes, que apontam para a necessidade de mudança no enfoque adotado pela gerência das empresas do setor petrolífero.


Introduction
The accident with the P-36 offshore oil rig on March 15, 2001, was one of the petroleum industry's worst international disasters 1 , resulting in 11 deaths and total loss of the rig, whose maximum production capacity was projected at 180,000 barrels per day, with an estimated financial loss of BRL 1 billion (more than USD 400 million at the 2001 exchange rate).The technological progress of Petrobras contrasted starkly with a string of serious accidents 2 that exposed gaps in the company's performance in health, safety, and the environment.
This article is part of a series of scientific studies in the research project coordinated by two of the authors, addressing the relationship between "work, health, and safety" in the petroleum industry, with a focus on offshore exploration and production in the Campos Basin, still Brazil's largest petroleum producing region, located off the northern coast of Rio de Janeiro State.We take a concise, focused approach (different from that of Figueiredo 3 ) to reflect on the disaster 15 years later, attempting to revisit some of the links in the chain of events that led to the accident.Based on the event, the aim is (a) to verify the role of the individual and collective human dimension (individually and collectively) in the reliability of highly complex systems, with a focus on the management of incident and accident situations with the potential for leading to large-scale accidents.We will specifically address the action by the fire brigade, which was criticized in some of the reports analyzed here.This analysis also highlights the need (b) to elucidate the organizational factors that can increase the risk level on offshore rigs, beyond the so-called immediate causes (human error and technical failures).
The debate reemerged in the Campos Basin in February 2015 (while Petrobras was at the center of the huge economic and political crisis gripping Brazil), when an explosion occurred on the Cidade de São Mateus FPSO, echoing the past and leaving 9 dead and 26 injured, some seriously.The vessel was chartered by Petrobras (owner of the exploration block), but operated by the Norwegian company BW Offshore (actually, in charge of the oil and gas pumping).The report issued by the Brazilian National Petroleum Agency (ANP) in August 2015 left no doubt as to the weight of the management failures in the incidental situation and organizational factors as causal elements contributing to the accident and its grave consequences 4 .A severe warning had been sounded in the international petroleum industry five years earlier, in April 2010, when such factors were also present in the genesis and unfolding of the Deepwater Horizon accident 5 , with even worse consequences: 11 dead, 17 injured, total loss of the rig, and the worst environmental disaster in history of the Gulf of Mexico.

Theoretical framework and methods
The underlying theoretical and methodological framework for our analyses over the course of the research project is based primarily on the Ergonomics of Activity 6,7,8 and Psychodynamics of Work 9,10 .We also take a synergistic approach involving both scientific knowledge and practical experience, pertinent to the analysis of work situations, in line with the ergologic perspective 11,12 .The current article emphasizes the Ergonomics of Activity, supported when necessary by contributions from other references according to the context's unique characteristics and problems.
The methodology mainly involved document research, especially the reports produced by the Petrobras Internal Inquiry Commission 13 , Brazilian National Petroleum Agency/Ports and Coastal Police (ANP/DPC) 14 , and Rio de Janeiro Regional Board of Engineering and Agronomy (CREA-RJ) 15 , which are the result of the inquiries conducted by the Commission and the latter two agencies, concluded in June, July, and September 2001, respectively.In the report by CREA-RJ, the workers were represented by some of their union leaders and were able to participate more effectively.Our references also included the limited amount of information on the disaster published in scientific articles, theses, and dissertations.We also analyzed the news stories in the bulletins of the Union of Petroleum Workers of Northern Rio de Janeiro State (Sindipetro-NF), the Petroleum Workers' Federation (FUP), and the mainstream press, in a large clipping file organized by the union.
Our understanding of the system's functioning was greatly facilitated by various interviews, on different occasions and some years after the accident, with three professionals that worked on the P-36 rig ever since it arrived at the Mauá Shipyard (the article only cites the operator referred to as "A").Their experience was extremely useful for clarifying details on the system's functioning, Cad.Saúde Pública 2018; 34(4):e00034617 besides giving us at least a partial grasp of the so-called "real work organization", decisive for minimizing our lack of knowledge on the rig's daily work routine.

Reliability and management of high-risk incident and accident situations
Despite the technical systems' high complexity, several authors 16,17 had already highlighted in the early 1990s that their overall reliability had stagnated or even decreased in some cases, especially regarding the management of critical incidental situations, in particular the safety systems meant to protect themselves from known failures.
In fact, under so-called "normal" situations, the system could at best be managed in automatic mode.However, if certain malfunctions emerged, requiring human intervention, especially if such malfunctions were rare, they would be dealt with by operators who had lost their expertise, with little information about how the preceding events had unfolded (since they had been managed automatically).
Such "expert systems" can be considered a kind of extension, in the field of cognition, of the efforts made in the domains of automation and "traditional" information technology, aimed at gradually replacing all human intervention, seen as the main source of complex systems' "unreliability".Given the limits evidenced by this perspective, the development of a new proposal for the design of assisted systems gained increasing space, revaluing the human operator's role and aiming to define cooperative problem-solving environments.The goal was to define the operator's respective role in the system 18 .
Importantly, in the situations described above, the operator has to make decisions under extreme time pressure, and all these conditions together can increase the likelihood of human error.The focus is on the human factor as the "weak link" in systems and on human error as the cause of serious malfunctioning in large modern systems.This recurrence is linked not only to the systems' growing complexity, as mentioned above, but also to the intrinsic difficulties in the analysis of the accident as a phenomenon 19,20,21 .Neboit 19 highlights that in complex systems, a difficult task is to elucidate the unwanted events or so-called "latent conditions" present at different levels (organizational, communicational, decision-making, etc.), since their origin is often far-removed in time and place from the actual accident.Thus, accidents can be viewed as the result of the combination of "active failures"unwanted conducts, clearer and closer to the end of the system (involving field operators, control room, maintenance crews) -and "latent conditions", which function as conditions of the surroundings or contextualization.
The same author 19 also states that the error is a deviation, and that regulation is based on this deviation.It is a means for regulation vis-à-vis the existing variabilities: in the environment, in the technical systems, in the tasks themselves, and in the operator's status.It is important to keep in mind that the error is the result of an adaptive system's functioning, which requires the permanent construction of commitments 22 .Thus, to understand the error and prevent or manage it involves understanding the paths and determinants that led to (or are prone to leading to) the circumstantial failure of this process of adaptation.Meanwhile, the determinants should be investigated not in relation to the operator, but mainly in the condition for performing his activity (technical, organizational, social, etc.).In a broad sense, if we wish to understand accidents, it is crucial to understand the work 23 .

The impact of the organizational factors
The previous line of argument highlights the relevance of proposals for a two-way approach involving the microscopic examination of the work activity and the macroscopic analysis of social life 12 .Such reasoning is supported by Amalberti 22 when he contends that the "ten golden rules" for systemic safety should unfold at three levels: macro (the system), medium (the company), and micro (the job).
These dynamics lead us back in the series of events along a longer timeline, addressing the proposals by some experts in the analysis of large-scale accidents, focusing on some of the socalled organizational factors.Llory 24 is categorical on this point in focusing on the disaster with the Challenger space shuttle.According to the author, it is necessary to reexamine the past for a more rigorous understanding of the event, avoiding attempts to explain in by the more direct and Cad.Saúde Pública 2018; 34(4):e00034617 immediate fatal mechanisms.It is necessary to examine the indirect underlying causes, which do not appear explicitly at first glance.This requires taking the stance of a "medical clinician, an analyst of organizations' functioning, the work of a historian or biographer, an analyst without foregone conclusions" 24 (p.185).
One of Llory & Montmayeul's 21 fundamental references is Ergonomics.As Wisner 6 highlights, for a long time Ergonomics has shown the multiplicity and interrelationship between the intervening factors in major accidents, contrary to the customary approach of limiting the analysis to the internal factors in the establishment where the accident occurred.
Corroborating this view, Woolfson et al. 25 also contend that in order to understand the worst accident in the history of the offshore petroleum industry, on the Piper Alpha rig, it is important to focus on one of the issues featured in the Lord Cullen's report.The Piper Alpha accident involved not only human errors and conflicting orders, but a long chain of events.The same authors point out that the disaster exposed the flaws in the offshore safety field.Likewise, to understand the Gulf accident requires looking at the British Petroleum track record, since the accident with the Texas City refinery in 2005 21,26 had already exposed some of the problems that appeared later on the Deepwater Horizon rig.Along the same line, to understand some aspects of the Fukushima tragedy 21 , one needs to observe the record of Tepco, the company operating that nuclear power plant.The same reasoning applies to the Petrobras case, shedding light on some of the company's policies in the years prior to the P-36 accident, like downsizing the workforce, expanding outsourcing, and organizational restructuring.
The P-36 oil rig was a complex social and technical system, and the system's technical dimension should not be underestimated.Still, based on the analysis of the management of incidental and accidental situations, we believe it is important to examine the extent to which one can identify some of the organizational factors at the origin of actions and decisions that contributed to the accident, which the report by the Petrobras Internal Inquiry Commission 13 only mentions as "recommendations" and "areas for improvement".

Brief description of the accident
On March 14, 2001, two non-routine operations were being performed: emptying the emergency drain tank (EDT) on the aft port, beginning at 22:21 hours, and preparation for inspection of the stability box.The oily water in the EDT was supposed to be pumped to the rig's production header, which receives the flow of oil and natural gas from the production wells.It was then supposed to flow off, together with the production of hydrocarbons, to the process plant.However, operational difficulties in starting up the bilge pump on that tank allowed a reverse flow of oil and gas through the tanks' flow-off lines, so that they entered the other EDT (aft starboard), since its intake valve allowed the oil and gas to pass (Figure 1), although it should have been closed.According to the Commission's report, it was not possible to confirm whether the passage was due to some damage to the valve or whether it was partially open.The operator A claimed categorically that it was closed.
The startup of the aft port EDT pump, after 54 minutes, considerably decreased the backflow of hydrocarbons.However, the water pumped out of the tank after the pump started also entered the aft starboard EDT, further increasing its pressure.Importantly, the booster pump on the aft starboard EDT had been removed for repair; this pump's air vent line, suction, and offload had been blinded (blocked); this tank's manual intake valve, as we mentioned, was not supposed to allow passage, as shown in the following illustration (Figure 1).This configuration resulted in the continuous pressurization of the aft starboard tank and its subsequent mechanical rupture some two hours after the beginning of the operation to empty the other tank (aft port).At 00:22 hours on March 15, 2001, a huge tremor was felt, similar to dropping a heavy load on the deck, due to the mechanical rupture of the starboard EDT (when it reached burst pressure).The resulting damage released oil, gas, and water from the tank into the column, besides causing the 18-inch saltwater tubing next to the tank to burst, starting the column's flooding.Due to Cad.Saúde Pública 2018; 34(4):e00034617

Figure 1
Flowchart of the emergency drain tank process involved in the first explosion on the P-36 rig.
this and other damage, the fire ring was depressurized and the process plant automatically entered emergency shutdown mode.
The gas released from the tank reached the internal area of the tank top and main deck, activating the gas sensors.However, since the third-and fourth-level areas had not been classified as risk zones, the gas released after the explosion was not detected immediately in the tank compartment, which also explains why the oil and gas were not contained in this area, since there were no adequate containment devices or explosion-proof equipment (Figure 2).
The emergency brigade quickly deployed to the site, and some of the brigade members entered the column.The hatch from the third to the fourth level was opened for inspection of the lower compartments, where an intense hissing noise was detected, like leaking water, and plus a thick white mist with no heat or flames.The inspection was hampered by lack of lighting in the area.
Approximately 17 minutes after the aft starboard tank burst mechanically, at 00:39 hours, there was a second, high-intensity explosion caused by the ignition of the natural gas released from the column, reaching the tank top and second deck.The explosion killed 11 members of the fire brigade and caused major destruction to the area located above the aft starboard column.After numerous unsuccessful attempts to correct the rig's heeling, it sank on the morning of March 20 27 .

From the importance of shared management in high-risk incidental and accidental situations…
It is crucial to emphasize that the 11 dead crew members belonged to the brigade.Despite the intensity of the second explosion, it only hit those who had rushed to the immediate area of the accident, the brigade members.The report by the Internal Commission 13 (p.10) points to the fact "that the brigade deployed directly to the site" as one of the items deserving attention, in the section on improvement of emergency procedures and plans.It suggests such measures as the use of portable gas detectors and communications systems during emergencies.The ANP/DPC report 14 suggests that the communications and coordination system between the emergency response crew and the rig's command proved deficient, but fails to provide specifics.The fire brigade's action deserves attention in this context.This raises the question: if there was no clarity as to the conditions associated with the course of events in a confined space, if the latter was not equipped with sensors that could properly back the brigade members' maneuvers inside it, and if communication was faulty between the brigade and the rig's command, isn extremely vulnerable the position of the workers responsable for conducting emergency responses?The following observation by operator A helps understand the degree of difficulty faced by those in charge of directly dealing with the accident, especially in the minutes after the first tank burst: "for starters, it was difficult to determine what had happened, there was no access to the actual site.We even thought that one of the columns had opened in the explosion and that the seawater had entered directly into the column, just to give you any idea".Such situations reveal the opacity of this kind of system  and the related difficulties in interpretation by the operators 28 .There was also faulty classification of the risk areas, e.g., failure to place gas sensors next to the EDTs, inside the columns.
In this case, would it not have been more prudent for the brigade to avoid accessing the site?If each accident has specificities that make it a unique arrangement, we concede that it is impossible to determine in advance all the procedures to be adopted in future situations; otherwise we would be contradicting our basic frame of reference.Still, this does not relieve management of the responsibility for mapping vulnerable points in its planning, identifying the existing gaps, in order to gather more elements to assist in the difficult task of managing incidental and accidental situations in which emergency decisions are made.In this mapping process, who can contribute better than the workforce in listing, identifying, and discussing with managers the project gaps and flaws (of an organizational nature), through shared decisions and collective analysis of risk situations, thereby exercising a more shared form of management?
Meanwhile, we are aware that in emergency circumstances, with the possibility of an expanded accident, when a "crisis" situation takes hold, as described by Rogalski 29 , the decision-making reveals itself in all its "drama" 11 .The degree of tension can reach paroxysmal levels, and some managers become paralyzed, as Rogalski highlights, citing the article by Flin et al. 30 on the Piper Alpha accident.
In addition, in an emergency, the alarms all sound at once, producing excessive information for the operators, compromising their cognitive capacity to interpret the unfolding phenomena, a classic problem identified by the Ergonomics of Activity in the control rooms of complex systems 6,7,31 .As operator A expressed it, "all the alarms went off".With the high degree of automation, frequently accompanied by complexification of decision-making, the operator does not always have timely access to adequate information.
Despite the long list of serious and fatal offshore accidents, the challenge here is not to conduct an inventory of deadly events, but to learn lessons from them to accumulate knowledge on accident prevention, more precisely on the management of incidental and accidental situations.In the case of P-36, certainly the best option would have been to avoid the first mechanical rupture from happening, "taking the system by the tail", according to jargon of some operators, to keep the system from entering into emergency shutdown.Our interview with operator A gave us a brief idea of the difficulties faced by the brigade after the first explosion.
But after all, how can management gather such elements to minimize or partially offset the gaps in reliability that end up allowing to emerge situations that are difficult to reverse?From this point on, the exam becomes more acute, testing the system's resilience, i.e., its capacity to resist the demands without entering into collapse, or as stated by Hollnagel et al. 32 , to adjust its performance to the disturbances in order to continue to function.
When we expand this discussion on decision-making in critical situations, we invariably run up against the human factor's individual and collective role in the prevention of serious accidents.A good start is to grasp the human factor from a different perspective, rather than viewing it as the system's weak link (similar to human error), in order to focus on what it adds in the positive sense.As Mendel (1999, apud Llory 24 , p. 21) says, "if most latent accidents do not become active accidents, it is due to the agents' daily work and knowhow at all levels".
Llory 24 is heir to a tradition that highlights the gap or lack of linkage between the knowledge of the engineers and decision-makers and that of the operators, who are frequently underrated by the former 9 .For Llory 24 , the main obstacle to a healthy prevention policy lies in this lack of linkage.The author knows the limits of the notion that defends the gradual suppression of all human intervention, where the latter is seen as the main source of complex systems' "unreliability".Llory's position finds echo in Reason, when the latter author states that it is part of the nature of complex systems, heavily interconnected, quite interactive, opaque, and partially modeled, to produce unfavorable surprises.Although it is possible for operating teams to create an appropriate framework of routines and procedures to recover from incidents, by simulating fictitious situations, "it is not certain that these will be pertinent to future events, except at a very generic level" 16 (p. 251).
If the operator's room for action is limited (since he is considered the system's "weak link", overridden by the technical device's purportedly greater reliability), such limitations on the freedom to react to the device means that when the operator is called on to intervene in rare situations, he lacks the necessary resources to deal with the demands as they unfold.
Cad. Saúde Pública 2018; 34(4):e00034617 According to Leplat 33 , the fact that the collective dimension can be enhanced positively or negatively, depending on the circumstances, emphasizes the importance of its management, which cannot be done only from the outside through rules imposed by the organization.That is, the human role should not be viewed in a reductionist way.Humans are not merely one more element in the system; they are also protagonists.
Given the growing incapacity of deterministic models to deal with increasingly sophisticated and varied situations, it is important to introduce flexibility as an essential trait of organizational models, allowing teamwork the freedom to better elaborate adapted solutions 31,34,35 , in permanent construction of commitments 22 .…to some of the organizational factors However, in light of the discussion here and the proposals by Llory & Montmayeul 21 , it is crucial to emphasize that safety in any industry cannot rely exclusively on the purportedly infallible reliability of field personnel, especially that of the operator, the "last link in the chain".Safety should be based on a structure that involves many activities, such as provisional risk studies, technical and organizational devices for correction, recovery, redundancy, etc.Beyond operator error and technical vulnerability, when the accident erupts, it ends up revealing the dysfunction of this complex organization as a whole.Inside it, actions are taken and decisions are made that can facilitate or hinder the operators' job, or even precipitating their error, besides providing the controls and means of recovery for actions by the operators themselves, or even allow the identification and correction of "hot spots", since they can favor the occurrence of accidents or anticipate them.The following situations illustrate how some of these issues appeared in the P-36 accident.
One key factor was the location of the EDTs inside two of the rig's supporting columns, showing two problems exposed clearly by the accident.The first was the placement of tanks inside columns that were not prevented from accumulating hazardous (explosive) substances.Although they were equipped with a ventilation system, it could fail in certain situations, as in the case of the accident.The second was the interconnection of these tanks with the process plant, emphasizing the decisive role of redundancy devices at strategic points, with the purpose of absorbing failures or anomalies in the system's functioning, since we know that its reliability is a function of its capacity to absorb failures over the course of production.Another point stands out in the report by the ANP/DPC 14 , concerning the functioning of the EDTs.According to the operations manual of the rig's process plant, in the normal functioning mode, the EDTs were supposed to remain isolated and only used in emergency situations involving emptying or storage.Thus, the frequent use of EDTs to store water contaminated with waste oil for most of the time in which the rig was in production was an abnormal procedure that acquired the status of normality, constituting what Wynne 36 calls "normal abnormalities".
The lack of gas sensors at certain strategic points and the simultaneous sounding of various alarms on the control room's panel clearly show that the discussion should focus on the quality and priority location of the information supplied (expressed here as sensors, alarms, etc.), at the appropriate time, and not simply the quantity of available information.The overall principles in this field should serve as backing, but they do not replace the specific analysis developed on each rig, in each real-life situation, since this complementariness in relation to the overall guidelines helps increase the system's reliability.
In addition, according to the CREA-RJ report 15 , the P-36 rig began operations without complying with all the deadlines and stages stipulated in the timetable for building, assembling, and operating.This allowed the final assembling of equipment in parallel with production activity, due to the shortened deadlines to meet the company's production targets.While such chronic problems cannot be linked directly and immediately to the causes of the accident analyzed here, they reinforce a culture that permeates many organizations, namely postponing (sometimes excessively) maintenance jobs or other types of interventions, due to the primacy of production targets, not unfrequently with serious consequences for the system's safety and reliability.
The Internal Commission's report 13 provides no information on the way such decisions are built in daily practice, or even on who participates in them.Or, to what extent the safety.
Petrobras also underwent a major organizational restructuring process in the early 2000s, under the aegis of a new organizational model emphasizing business units, aimed at reaching targets of Cad.Saúde Pública 2018; 34(4):e00034617 greater productivity, profitability, expansion, and internationalization.Under such an arrangement, the system's control focuses most heavily on meeting targets and results, giving these areas greater freedom to manage their projects and allowing a watering-down of responsibilities.This business units model, widely implemented by multinational petroleum companies, is mentioned by Le Coze 5 in his analysis of the Deepwater Horizon accident on the Gulf of Mexico.He highlights that with the creation of such units in the company, the heart of British Petroleum's business shifted explicitly from engineering to the commercial and financial management of those units, with strong impetus for outsourcing.

Final remarks
Corroborating part of our analysis, some experts 5,6,21,37,38 have highlighted the "organizational factors" in the analysis of accidents with international repercussions in recent decades, involving socially and technically complex systems.
The situation unveiled in Brazil with the advancing exploration of petroleum in the pre-salt layer, when faced with the indicators on accidents in the last 20 years 3 and with large-scale accidents like the P-36 oil rig and the Cidade de São Mateus FPSO, appears to confirm a substantial and hazardous mismatch between technological innovation and risk management 39 .
This emphasizes the fact that growing complexity -in Brazil's case, partly associated with this technological progress -makes it increasingly difficult to specify detailed the procedures entrusted to the collective 33 .This entails a major and virtually irreducible component of uncertainty and unpredictability.Humans should thus be left with sufficient autonomy to manage such situations.The failure to guarantee this margin of autonomy can jeopardize the system's reliability.If the operators are limited to performing predefined tasks, the errors tend to manifest themselves during the occurrence of exceptional events.
Still, we contend that for the management of petroleum companies to adopt this focus assumes an attentive eye and sensitive ear to the real work's nuances (as actually performed), combining the accumulated expertise in health and safety with the knowledge that emerges over the course of the activity, and which is frequently outweighed by formal hierarchies, norms, and procedures.

Contributors
M. G. Figueiredo participated in the study's conception, planning, data analysis and interpretation, critical revision of the content, writting and approval of the final version for publication.D. Alvarez and R. N. Adams contributed in the data analysis and interpretation, critical revision of the content, writting and approval of the final version for publication.

Figure 2
Figure 2Illustration of the aft starboard column.