Correspondence model of occupational accidents

We present a new generalized model for the diagnosis and prediction of accidents among the Spanish workforce. Based on observational data of the accident rate in all Spanish companies over eleven years (7,519,732 accidents), we classified them in a new risk-injury contingency table (19×19). Through correspondence analysis, we obtained a structure composed of three axes whose combination identifies three separate risk and injury groups, which we used as a general Spanish pattern. The most likely or frequent relationships between the risk and injuries identified in the pattern facilitated the decision-making process in companies at an early stage of risk assessment. Each risk-injury group has its own characteristics, which are understandable within the phenomenological framework of the accident. The main advantages of this model are its potential application to any other country and the feasibility of contrasting different country results. One limiting factor, however, is the need to set a common classification framework for risks and injuries to enhance comparison, a framework that does not exist today. The model aims to manage work-related accidents automatically at any level.


INTRODUCTION
The aim of this paper is to present a generalized model of occupational accidents at a national scale, specifically for the Spanish workforce, by considering accidents as a compound risk-injury event.We aim to establish relationships of affinity between the component methods of these two variables, thereby generating a pattern against which Spanish companies can be analyzed.Although the model presented is multi-sectored, the same methodology, patterns can be applied to identify specific patterns for each individual industrial sector (metals, transport, chemicals, etc), if our goal is a more specific analysis of companies.
Risk assessment is currently an essential tool in managing safety in the workplace (Amendola 2002), and is used for predicting accidents among the workforce (Kjellén and Sklet 1995).This evaluation process involves three phases, all of them fundamental in the subsequent preventive action: identification, assessment and prioritization (Van Duijne et al. 2008, Frijters andSwuste 2008).In occupational accidents, the three mentioned phases take into consideration the error associated with the subjectivity required by the methodology used in the evaluation, although we need to redirect this methodology towards more objective models (Leveson 2004).This article aims to approach the problem in relation to the three phases above mentioned.
The standard evaluation of safety among the workforce begins by identifying an "assumed" (hypothetical) risk in the workplace, either by means of free observation or by means of a formal checklist (Rouhiainen 1992).Firstly, the evaluation uses tables of values to try to identify the probability and consequence variables An Acad Bras Cienc (2011) 83 (3) at a general level (Fine 1973).Secondly, the risks are prioritized according their importance.
The main handicap in the standard evaluation of the "assumed" risks identified in the various jobs in a specific company lies in the fact that these risks are isolated, independent events, which may or may not affect individuals (Conte et al. 2007).To characterize an accident based on a risk before it actually happens is of little use at present, as it is subject to the fundamental premise of uncertainty.Therefore, once the "assumed" risk has been identified, it is not possible to prove either when the injury will occur or if it will occur, or even its level of severity.
The mentioned change of scale, moving from the rate of accidents (population) to the actual accident (the individual), adds a high level of randomness to the evaluation techniques used: the classic criterion variables play no proven role in the identification of risk (Körvers and Sonnemans 2008).Consequently, only individual, technical criterion prevails in the choice of the specific risk value.
In this sense, the conceptual generalization of risk proposed by Giddens (1994) and Beck (1999) is relevant, which they consider to be the modern focus of the forecasting and control of future (undesired) consequences of human actions, also combining two elements that have always been mutually exclusive so far: nature and society.They use the latter association to characterize the present day society, which they call a "risk society".According to these authors, a type of society is developing that will manage to overcome the problem of uncertainty generated by human actions or "manufactured uncertainty" (Giddens 1994) instead of by risks that can be forecast based on certain laws of science and natural systems (Beck 1999).
They see pre-industrial hazards conceived as "events of destiny", but that nowadays "industrial risks" pose the problem of a demand for social responsibility (accountability), enabling the assessment of risks that have not taken place to become the subject of prevention, compensation and expectation of preventive measures.Therefore, in Spain, 40% of companies have accidents.It is calculated that, in 60% of cases, the preventive risk action lacks any apparent use as it attempts to control a problem that does not exist.
Trying to minimize the uncertainty caused by the assessment techniques used, this article suggests the use of a more deterministic alternative: we propose to base the risk assessment of the control of occupational accidents upon the analysis of specific accidents suffered by a company, and not upon a set of "assumed" risks whose analysis goes beyond reality.
Therefore, this evaluation begins with an overall assessment of the risks that have actually taken place in a given workforce, avoiding the concept of their individuality, which is physically associated with spatial (Nicholson 1998), temporal (Sari et al. 2009) and material uncertainty (Hammer 1994).
Our aim, therefore, is to identify the real risks from a log of accidents and summarize them in a contingency table; we shall reach the criteria needed for their assessment and prioritization by the mathematicalstatistical analysis.
This approach defines new quantifiable accident properties that, at least, help us to view the problem objectively, providing new control criteria based on a deeper knowledge about them.
Therefore, we have initially defined a general accident model (Conte et al. 2008), typical of a country, whose properties are applicable to any company within this country, the model is to be understood and used as a pattern, yardstick or standard for contrast (Garcia et al. 2009).We call it "acsom", the acronym of "accident soma" (accident body).To obtain it, we used a log, which is a temporal series of the accidents that occurred in this country, and each component in the series was reported in a risk-injury contingency table, which summarizes the accident rate recorded in each annual period.
Conceptually, acsom represents an "equilibrium diagram" of accidents.As we considered each accident as a compound event derived from the risk-injury pair, we identified the risk-injury (RI) type, which each accident comprises.The collection of these RI pairs for any given country constitutes its acsom-G, and it is presented as a compensated outline of its accident.It covers all the productive sectors, that is, all the positive and negative typological anomalies that characterize each area of activity.When we put these anomalies together, they combine with one another, compensating one another.This produces a matrix (RI) diagram and marginal profiles (R or I), which are used as a standard of equilibrium/balance.The local patterns, or acsom-S, belonging to each branch of production, are interpreted in the same way.
The isolated profiles, which are marginal to the contingency table and which can be obtained for each company, in comparison to the marginal profiles in acsom-G, show deviations of accidents of the company, thereby allowing us to identify which types follow the R or I equilibrium profile, and which types deviate above or below it.
By means of a correspondence analysis, we present a global model (acsom-G) for accidents, its underlying data structure was made up of three groups of risks and injuries.We identified these three groups by colors (red, yellow and green) to recognize them visually.The colors do not indicate the level of seriousness of each group, but the features associated with the frequency of occurrence.
The correspondence model has been applied by various authors to the study of occupational accidents, although they have used it for the study of specific cases.Laflamme et al. (1991) applies the typology of the accidents in a Canadian car company and in a transport company.Williamson et al. (1996) analyzed 1738 industrial accidents in Australia to discover the relationship between work activity carried out in the workplace where the accidents happened and their nature.Baril et al. (2003) applied this methodology to a population of 13,728 injured people in order to establish the relationship among the activities of the injured workers, the types of injury suffered and the way the company deals with casualties.
The correspondence analysis was selected (Benzécri 1992, Greenacre and Blasius 1994) as the most suitable method to optimize the initial matrix functions, it reduced the information contained in the contingency table and established affinity relationships among the variable components of the table, thus obtaining a classification based on factorial coordinates (Joaristi and Lizasoaín 1999).Moreover, to obtain models of accidents in the companies, the correspondence analysis is undoubtedly the most suitable method because of its great power and elasticity.It makes no difference whether the table is finished or unfinished, that is to say, whether it presents structural or sample zeros, since [ λ i = χ 2 /N ]: the association grade among the variables defined by their eigenvalues (λ i = eigenvalues from the diagonalization of the matrix; χ 2 = Pearson chi square value of the contingency table; N = total frequency of the table ).
The contingency table obtained (Table I) shows three key elements: the total value, the marginal profiles, and the central body of the table or matrix.Each of these identified elements can be analyzed separately by using different methodologies.
In summary, the method is to compare the features of the accidents of a specific company as opposed to the features in their pattern of reference (acsom-G or S).This methodology is applicable to any company, regardless of size.In addition, one can automatically obtain the following: forecasts and predictions of accidents, and prioritizations of risks and injuries.Finally, it enables a follow-up in real time by implementing adequate control resources.

MATERIALS
In our model, an accident in the workforce is considered a compound event, composed of risks (R) and injuries (I).The risk is understood as a basic generating and component unit of the accident, which refers to the physical process inducing the injury.This latter, as it appears as the material evidence of one or more risks, is the basic compositional element or biological product, from which the occurrence of an accident on the individual is identified.
We have taken into consideration all the reports on occupational accidents notified over eleven years (7,519,732 accidents), registered (Ministerial Order 16-12-1987, BOE 311, of 29 th December) and published by the Spanish Ministry of Labor (Secretaría General Técnica, Subdirección General de Estadísticas Sociales y Laborales).The risks and injuries mentioned in these notifications and reports are codified following the criteria of the International Labor Organization presented in the X International Conference of Statistical Labor of 1962.The data obtained are summarized in a contingency table of 19 risks (R) by 19 injuries (I), titled the starting risk-injury matrix.
The 19 categories of each variable (R or I) have a disjunctive and exhaustive codification.We considered the categories of the two initial multiple nominal variables as binary nominal variables, thereby obtaining 38 variables.
As the chosen variables R and I have a generic and exhaustive character, each one can be further subdivided, if it is useful, into other more specific derivatives of the main variable of reference.
To obtain an affinity relationship model among the variable components of acsom-G, we carried out the analysis on the average year matrix (Table I).It must be understood that, in order to obtain annual contingency tables, a Poisson type design (Aguilera 2001) was followed: the original frequencies that form the boxes of the contingency table for each period are independent random variables with Poisson distribution, and are filled freely.This basic table, or average year, of fictitious values appears as an incomplete table, which does not verify the hypothesis of symmetry of population probabilities and shows heterogeneity in its marginal distributions.It defines a theoretical body of yearly accidents, which allows the analytical determination of an acsom-G pattern.The list of codes is as follows:

METHODS
The risk-injury matrix (Table I) was initially analyzed by using various skills of unsupervised learning techniques of data mining analysis (Hand et al. 2001), that is to say, by using statistical exploratory multivariate techniques, with the aim of identifying variable groups and verifying the obtained results, so as to define a global pattern (global model or acsom-G).Similarly, local patterns have been obtained (local model or acsom-S) for each branch of activity, outside the scope of this paper, to reflect the features that are typical of the industrial area to which it refers.
When we used the rate of accidents among the workforce the profiles in the contingency table follow Binomial and Poisson probability compound models (Rubio 1983) interpretable in its set as multinomial distributions (Aguilera 2006), Figures 1-2.Although an ideal representative option would be a bar diagram, a polygon is used in order to better see the differences among the various types of variables, which a bar diagram does not provide.
After selecting the limited factorial model, we verified its characteristics in view of the absolute contribution of the categories to each factor and the relative contributions of each category to the building of each axis.The suitability of the factorial model is checked by examining the variances shared by the chosen factorial axes, for each variable, and their correlation matrix.
The comparison among the categories comprising factorial planes and axes shows different features associated with the risk and injury variables, which we shall interpret at a later stage.
We have also corrected the active symmetries of the factor axes, which have appeared at some point in the process of the systematic dimensional reduction.These active symmetries alter the position of the group set without affecting their relative positions (Real 2001).Therefore, by means of an adequate homographic transformation, it returned to their original position without altering the features of the obtained initial settlement.
To quantify the affinity relationships among different modes, the distances between two points are calculated by using the Minkowski distance widespread on the factor coordinates.This is necessary because the graphic three-dimension representation is rather complex and, in two dimensions, errors can easily be made due to a deformation of the distances among modes that is caused by the orthogonal projection method in use.Therefore, it is possible to analytically prioritize the injury group concerning a specific risk.It means that, it is possible to define an order based on the proximity of the injuries with regard to the above-mentioned risk, or vice versa, to define an order based on the proximity of the injury compared to this risk and vice versa.It is possible to define an order based on the proximity of the risk compared to an injury.

RESULTS
The factors or dimensions obtained by the correspondence analysis for the variables under consideration are shown in Table II, along with the percentage of total variance that explains each factor.The absence of the trivial solution (λ 1 = 1) shows that the analysis has been carried out on the centers of gravity of rows and columns.As the rows are quasibarycenter of the columns and vice versa, they allow simultaneous graphic representation.
We selected the first three factors as a limited model (Table II), representing 72.9% of the total variance of the risk and injury variables.Therefore, we fulfilled Hair's criterion (Hair et al. 1999), which recommends that all the dimensions with inertia greater than 0.2 are selected.
Figure 3 shows the spatial disposition of the three axes or dimensions chosen as a solution.Each factorial axis is composed of a linear combination of the cat-egories belonging to two different groups.Therefore, some of categories in the green group characterize Dim1's positive side, and some of the categories in the yellow group characterize its negative side.Dim2's positive side is characterized by the rest of the categories in the green group, and in its negative side by the categories in the red group.The red group characterizes Dim3's positive side, while its negative side is characterized by category R 13 , which belongs to the yellow group.Therefore, we defined three semi-planes corresponding to each of the mentioned groups (Table III).An Acad Bras Cienc (2011) 83 (3)  This three-group solution (Table III) is well characterized by each of the three factorial hyperplanes: group 1 (green), hyperplane (Dim1, Dim2); group 2 (yellow), hyperplane (Dim1, Dim3); and group 3 (red), hyperplane (Dim2, Dim3).The factorial axes show a mixture of categories: Dim1 (green and yellow categories), Dim2 (green and red categories), and Dim3 (yellow and red categories).The fact that Dim2 represents few categories in the green group is a consequence of the low-shared variance of these categories with the rest of their group.This difference also classifies the various characteristics into two sub-groups of green categories.
Tables IV and V give the factor scores obtained for each of the studied variables, which are calculated for the first three dimensions and are sufficient to project the three stated groups.This three-dimensional approach is the one that defines the correct affinity relation-  ship among the analyzed variables.We must be careful in interpreting the above-mentioned relationships using the two-dimensional projections from the decomposition of the bucket solution, due to the distortions that the plane projection imposes on the results space.
The absolute contribution of a variable to a dimension indicates the percentage of inertia (variance) of this dimension attributable to the above-mentioned variable.Tables VI and VII show the variables that are most important or best characterize the chosen dimensions.Therefore, we observed that, for the risk variables, 72.6% of the inertia of Dim1 are due to 62.1% from R 10 (projection of fragments or particles), and 10.5% from R 14 (exposure to heat contacts).For Dim2, 86.2% of their inertia are distributed among three variables: 47.9% from R 14 (exposure to heat contacts), 27.2% from R 10 (projection of fragments or particles), and 11.1% from R 16 (exposure to chemical contacts).In the case of Dim3, both variables represent 75.8% of their inertia: 58% from R 13 (overexertion) and 17.8% from R 9 (bruises, contusions and cuts by objects or tools).
For the injury variables, 77.4% of the inertia of Dim1 are distributed between two variables: 57.3% from I 11 (objects in the eyes), and 20.1% from I 13 (burns).For Dim2, 93.7% of their inertia are distributed among 28.9% from I 11 (objects in the eyes) and 64.8% from I 13 (burns).For Dim3, 82.3% of their inertia are distributed among 43.3% from I 4 (back pain), 15.4% from I 10 (bruises, contusions and crushing), 13.8% from I 8 (other injuries), and 13.7% from I 3 (twists, sprains and strains).
Figures 4 and 5 represent the risk forms (R 10 for Dim1, R 14 for Dim2 and R 13 and R 9 for Dim3) and injury forms (I 11 for Dim1, I 13 for Dim2, and I 4 and I 10 for Dim3) respectively, which most contribute to the formation of each axis where the centroids of the groups are located.The stated forms have projected orthogonally to three planes formed by three axes solution.
Considering the risk and injury variables together, and for Dim1, the relationship of R 10 with I 11 (the variables that most contribute to the inertia) is noticed, which perfectly explains the qualitative meaning or affinity of this risk-injury pair.In the same way, the rela-   An Acad Bras Cienc (2011) 83 (3) tionships are verified among many studied variables: R 14 with I 13 (Dim2), R 13 with I 4 , and R 9 with I 10 (Dim3).
The variables that most influenced the inertia of the dimension are also those that are near to the centroid.The relative contribution of a dimension to a variable, Tables VIII and IX, represents the correlation's measure between the dimension and the variable.This indicates the proportion of the inertia of the variable explained by the dimension.The amount of relative contributions of a variable is equivalent to the concept of its shared variance (communality) used in the classic factor analysis.
This situation shows the discontinuity that appears in the frequencies of the initial contingency table, one of high and one of low frequencies.The sub-table of high frequencies is largely characterized by Dim3, while the low frequencies are represented by the other two dimensions.
With regard to the data, the shared variance among the chosen factorial axes has been analyzed, for each variable, as well as the correlations matrix among them.Both shared variances and obtained correlations indicate the independence among the factorial axes and, therefore, a suitable representation of the information by the model, that is, the stability of the adopted solution.
Figures 6 and 7 present the previously achieved results.The aim is to verify the scattering of groups.
Table X shows the relationships of affinity between the risk and injury vectors calculated as Minkowski distances, forming a decision criterion of great interest in their forecasting and prioritizing.

DISCUSSION
The presented methodology overcomes certain limitations imposed by classical analytical methods regarding accident rates: "free risk assessment methods", that use tables valued on a qualitative or quantative ordinal scale, but with major limitations that impose the direct and sub-jective assignment of risk values and "logical methods" based on the analysis of probability trees (event and fault trees) where the majority of starting probabilities are usually estimated and not calculated on observations.This methodology allows the calculation of probabilities associated with the diverse nature of occupational accident rates, being the basis for an in-depth ana-lysis of the observed frequencies and greatly exceeding the analytical expectations of the currently used methods as indicated in the above paragraph.
The correspondence analysis is the core or central body of this new methodology, but its full development exceeds the parameters of this article.A basic concept that it provides is that of a "population pattern of accident rates" named acsom, which can include various studies aimed at characterizing (the study of masses and potentials), comparing (the study of company-pattern deviations) and controlling (preventive action plans) corporate accident rates, mainly regarding the frequency of their occurrence and their seriousness.
As indicated by Schroeder-Frechette (1999), the multi-dimensional approach to risk must take into account the following ethical problems: (1) who defines the risk and how it should be defined, (2) who evaluated the risk and in accordance with what rules, and ( 3) under what conditions is ethically acceptable to impose risks upon the society.
The problem of occupational accidents is a restricted variant on the problem of risk, as defined by Giddens (1994) and Beck (1999), appearing in their manifestation at least as a reflection in the physicalnatural world.
Regardless the existence of human beings, the risk of accidents will continue to exist as a natural phenomenon that may happen, as in fact they do, to any other biological species.The frequency at which they occur (the accident rate) is increased by social-economic activity, and their control is ethically obligatory as they cause injuries of varying intensity (seriousness) to the health of individuals.
The correspondence model, or joint probability model, of accident rates in the Spanish workforce reproduces some risk-injury groups similar to those obtained through other analyses based on the study of rows and columns: principal components, multidimensional scaling and hierarchical clustering analysis.This is what best establishes the relationships between risk and injury, being confirmed as the most suitable analytical method to treat the exposed problems under the proposals already outlined.
As for the groups, the group-1 or green includes all those risk and injury variables related to technological problems of recent historical appearance (the industrial revolution) and related to scientific and technical development (Baram 2009, Rasmussen 1997).The group-2 or yellow group contains all those risk and injury variables related to evolutionary biomechanical problems (Nachreiner et al. 2006).The group-3 or red group contains all those risk and injury variables related to technical-cultural problems (Guldenmund 2000) and to the evolution of their activity.
The groupings also match the results obtained by Douglas and Wildavsky (1982) by indicating the lack of differences between the hazards that were posed in early history (red and yellow groups) and those from developed civilizations (red, yellow and green groups), excepting in the type of cultural perception and the way in which a civilization has organized itself into a global society.

RISK AND INJURY VARIABLES OF THE FIRST FACTOR
The variables of the first factor contribute to the formation of the positive side of dimension 1, the categories of the green group {R 10 , R 16 , R 17 }, and the formation of the negative side of the yellow group {R 1 , R 2 , R 6 }.This dimension is associated to projections of fragments at the macroscopic or microscopic scale, to the exposure to solid, liquid or gas chemical substances, to radiation or to exposure at a subatomic scale (green group), with the accident rate for anomalies of gravitational interaction, fall of persons and treading on objects (yellow group).This dimension can be interpreted as the physical process "projections" from the environment on the individual (green group) or from the individual on the environment (yellow group).

RISK AND INJURY VARIABLES OF THE SECOND FACTOR
Axis 2 is formed by the linear combination of variables of the green group {R 14 , R 15 , R 18 } whose common factor is thermal effects and the resultant injury mostly being burns (trauma-type, thermal-type), which contributes to the positive side.Similarly, on the other side of the variable, the R 19 from the red group, in which the fewest injuries are of trauma-type generated by the interaction with living beings, including falls, bruises, strokes, blows, shocks, bites, stings, etc., is the main contribution to the negative side of the axis.

RISK AND INJURY VARIABLES OF THE THIRD FACTOR
Axis 3 is formed by the yellow group, with R 13 forming its negative side, and its positive side is formed by the variables of the red group {R 3 , R 4 , R 5 } that represent "fall of objects", a by {R 7 , R 8 , R 9 } that represent bruises, blows and collisions against or by objects.R 11 represents the cases of being caught by objects, and R 12 represents the cases related to mobile machinery and traffic in the workplace (excluding accidents while travelling).Axis 3 represents the trauma-type injuries caused by external agents to the individual, or by the individual himself.

FACTORS COMPOSITION: FACTORIAL PLANES
The combination of the factorial axes defines three factorial hyper-planes, which represents the risk and injury groups.Therefore, axes 1 and 2 define the green group (environmental risk and injury), axes 1 and 3 define the yellow group (risks associated with individual and muscle-skeletal injuries), and finally axes 2 and 3 define the red group (individual mixed risk and traumatype injuries).

GREEN GROUP: PLANE (DIM1, DIM2)
The industrial accident identifies this group as risks associated with the work environment and injuries caused by the environment.
The green group is characterized by the low occurrence of the frequencies of its component variables, the temporal instability of their relative frequencies and the highest accumulation of mass in two or three injury variables.Figures 1 and 2 represent the multinomial distributions corresponding to a risk (R 15 ) and an injury (I 17 ), respectively, from the green group.Individuals are presented as a passive element in the individual interaction environment, without the ability to respond to an accident.

YELLOW GROUP: PLANE (DIM1, DIM3)
The yellow group is characterized by the high occurrence of risk, the temporal stability of its relative frequencies, and the highest accumulation of mass in one or two injury variables (heterogeneity in the distribution).Figures 1 and 2 represent the multinomial distributions corresponding to a risk (R 6 ) and an injury (I 3 ), respectively, from the yellow group.Individuals are presented as an (dynamic) active element in the individualenvironment interaction, and responsive to the accident.The environment will be a (static) passive element.The red group is characterized by the high occurrence of the risk, the temporal stability of its relative frequencies, and the distribution of the principal mass in 5 or more categories (greater homogeneity in the distribution).Figures 1 and 2 represent the discrete multinomial distributions corresponding to a risk (R 4 ) and an injury (I 10 ), respectively, from the red group.Both the individual and environment can be active elements in the interaction.

CONCLUSIONS
The presented risk-injury correspondence model results in three groups of risks and injuries.The advantage over other factorial models stems from the joint treatment of risks and injuries, thereby obtaining groupings composed of the variables.These three groups, called green (technological/environmental), yellow (biological/evolutionary) and red (technical/cultural) groups, have been verified as a necessary and sufficient condition for the abbreviated representation of occupational accident rates.
These risk and injury groupings define a pattern that we called "accident soma" or acsom-G, which should be understood as a global model that represents the balancing conditions of occupational accidents in a population, enabling multiple specific analyses of companies to be carried out.
Based on the presented result, new possibilities are opened for the development of applications focused on the automatic analysis, interpretation and management of occupational accidents, thereby minimizing uncertainty and improving the objectivity not offered by current methods.Palavras-chave: modelo de correspondência, análise de contingência, risco, injúria, acidentes ocupacionais.

Fig. 3 -
Fig. 3 -Characteristics of each factor according to the component categories.

Fig. 6 -
Fig. 6 -Factor model: groups of risks for three axes.(Red group without label; points projected on coordinate planes).

Fig. 7 -
Fig. 7 -Factor model: groups of injuries for three axes.(Red group without label; points projected on coordinate planes).

Table V -
Injury variables.