Acessibilidade / Reportar erro

COMPOSITIONAL STATISTICAL MODELS UNDER A BAYESIAN APPROACH: AN APPLICATION TO TRAFFIC ACCIDENT DATA IN FEDERAL HIGHWAYS IN BRAZIL

ABSTRACT

This study considers the use of a composicional statistical model under a Bayesian approach using Markov Chain Monte Carlo simulation methods applied for road traffic victims ocurring in federal roads of Brazil in a specified period of time. The main motivation of the present study is based on a database with information on the injury severity of each person involved in an accident occurred in federal highways in Brazil during a time period ranging from January, 2018 to April, 2019 reported by the federal highway police office of Brazil. Four types of events associated with each injured person (uninjured, minor injury, serious injury and death) are grouped for each state of Brazil in each month characterizing compositional multivariate data. Such kind of data requires specific modeling and inference approaches that differ from the traditional use of multivariate models assuming multivariate normal distributions.The proportion events associated to the accidents (uninjured, minor injuries, serious injuries and deaths) are considered as a sample of vectors of proportions adding to a value one together with some covariates such as pavement conditions in each province, regions of Brazil, months and years that may affect the severity of the injury of each person involved in an accident. From the obtained results, it is observed that the proportions of serious accidents and deaths are affected by some covariates as the different regions of Brazil and years.

Keywords:
accident victims; types of injuries; deaths; federal highways; compositional data; Bayesian approach

1 INTRODUCTION

A major world public health problem is related to traffic accidents where the death toll reached 1.35 million in 2016. With the fast increase of vehicles in circulation and the lack of monitoring infrastructure especially in third world countries the situation tends to get worse. According to a report from the World Health Organization (World Health Organization et al., 201845 WORLD HEALTH ORGANIZATION ET AL. 2018. Noncommunicable diseases country profiles 2018. World Health Organization.) as progress is made in the prevention and control of infectious diseases, the number of deaths from non-communicable diseases and injuries has increased significantly in recent years.

Traffic is already responsible for the eighth cause of death in all age groups, where traffic injuries are currently the leading cause of death for children and young adults aged 5 to 29 years. An improvement in traffic deaths reduction has already been observed in more developed countries, but the situation is catastrophic in most emerging and poor countries. There is a strong association between the risk of death in traffic and the income level of the countries. With an average rate of 27.5 deaths per 100,000 inhabitants, the risk of death in traffic is three times higher in low-income countries than in high-income countries, where the average rate is 8.3 deaths per 100,000 inhabitants. In addition, the number of road traffic fatalities is disproportionately high among low and middle-income countries relative to the size of their populations and the number of motor vehicles in circulation compared to the rest of the world (see Table 1).

Table 1
Proportion of population, traffic deaths and number of registered vehicles by country in 2016 (income based on World Bank classification in 2017).

In many emerging countries, including Brazil, this problem gets worse by a number of factors, including low educational attainment and severe infrastructure problems on highways and urban roads (see for example, World Health Organization, 201844 WORLD HEALTH ORGANIZATION. 2018. Global status report on road safety 2018. World Health Organization.; Bhalla et al., 201410 BHALLA K, SHOTTEN M, COHEN A, BRAUER M, SHAHRAZ S, BURNETT R, LEACH-KEMON K, FREEDMAN G & MURRAY C. 2014. Transport for health: the global burden of disease from motorized road transport.; Waiselfisz, 201342 WAISELFISZ JJ. 2013. Mapa da violência 2013: acidentes de trânsito e motocicletas. Rio de Janeiro.; Bahadorimonfared et al., 20137 BAHADORIMONFARED A, SOORI H, MEHRABI Y, DELPISHEH A, ESMAILI A, SALEHI M & BAKHTIYARI M. 2013. Trends of Fatal Road Traffic Injuries in Iran (2004-2011).; Bacchieri & Barros, 20116 BACCHIERI G & BARROS AJ. 2011. Acidentes de trânsito no Brasil de 1998 a 2010: muitas mudanças e poucos resultados. Revista de Saúde Pública, 45(5): 949-963.; Jorge et al., 200922 JORGE M, KOIZUMI MS ET AL. 2009. Acidentes de trânsito causando vítimas: possível reflexo da lei seca nas internações hospitalares. Revista ABRAMET, 27(2): 16-25.; Marín-León et al., 201229 MARÍN-LEÓN L, BELON AP, BARROS MBDA, ALMEIDA SDDM & RESTITUTTI MC. 2012. Tendência dos acidentes de trânsito em Campinas, São Paulo, Brasil: importância crescente dos motociclistas. Cadernos de Saúde Pública , 28(1): 39-51.; Andrade & Mello-Jorge, 20163 ANDRADE SSCA & MELLO-JORGE MHP. 2016. Mortality and potential years of life lost by road traffic injuries in Brazil, 2013. Revista de saude publica, 50: 59.; Marín & Queiroz, 200028 MARÍN L & QUEIROZ MS. 2000. A atualidade dos acidentes de trânsito na era da velocidade: uma visão geral. Cadernos de Saúde Pública , 16: 7-21.; Lyons et al., 200826 LYONS RA, WARD H, BRUNT H, MACEY S, THOREAU R, BODGER O & WOODFORD M. 2008. Using multiple datasets to understand trends in serious road traffic casualties. Accident Analysis & Prevention, 40(4): 1406-1410.). In Brazil, the high numbers of accident injuries especially with serious injuries has been a challenge for the single health system (SUS) (Malta et al., 201227 MALTA DC, SILVA MMAD & BARBOSA J. 2012. Violências e acidentes, um desafio ao Sistema Único de Saúde. Ciência & Saúde Coletiva, 17(9): 2220-2220.; Jorge et al., 200821 JORGE M, KOIZUMI MS, TUONO VL ET AL. 2008. Acidentes de trânsito no Brasil: a situação nas capitais.; Silva & Andrade, 199637 SILVA S & ANDRADE S. 1996. Acidentes de trânsito: Problema prioritário de saúde. A Construção do SUS a partir do Município, pp. 95-99.; Klein, 199424 KLEIN CH. 1994. Mortes no trânsito do Rio de Janeiro, Brasil. Cadernos de Saúde Pública , 10: S168-S176.; Jorge et al., 199423 JORGE M, LATORRE MR ET AL. 1994. Acidentes de trânsito no Brasil: dados e tendências. Cadernos de Saúde Pública , 10: S19-S44.; Haagsma et al., 201616 HAAGSMA JA, GRAETZ N, BOLLIGER I, NAGHAVI M, HIGASHI H, MULLANY EC, ABERA SF, ABRAHAM JP, ADOFO K, ALSHARIF U ET AL. 2016. The global burden of injury: incidence, mortality, disability-adjusted life years and time trends from the Global Burden of Disease study 2013. Injury prevention, 22(1): 3-18.). It is also observed that the number of deaths at the crash site on Brazilian highways is very large compared to other emerging countries and first world countries. Many studies related to road improvement under an operational research approach are presented in the literature (see for example, Martínez et al., 201732 MARTÍNEZ F, BALDOQUÍN MG & MAUTTONE A. 2017. And solution method to a simultaneous route design and frequency setting problem for a bus rapid transit system in Colombia. Pesquisa Operacional , 37(2): 403-434.; Castro Aragón & Leal, 200312 CASTRO ARAGÓN FR & LEAL JE. 2003. Alocação de fluxos de passageiros em uma rede de transporte público de grande porte formulado como um problema de inequações variacionais. Pesquisa Operacional, 23(2): 235-264.; Novaes, 200134 NOVAES AG. 2001. Rapid-transit efficiency analysis with the assurance-region DEA method. Pesquisa Operacional , 21(2): 179-197.) but not so many related to traffic accidents. Among these studies related to road accidents under an operational research approach we could quote Baykal-Gürsoy et al. (2009)9 BAYKAL-GÜRSOY M, XIAO W & OZBAY K. 2009. Modeling traffic flow interrupted by incidents. European Journal of Operational Research, 195(1): 127-138.; Szwed et al. (200640 SZWED P, VAN DORP JR, MERRICK JR, MAZZUCHI TA & SINGH A. 2006. A Bayesian paired comparison approach for relative accident probability assessment with covariate information. European Journal of Operational Research , 169(1): 157-177.); Haastrup (199417 HAASTRUP P. 1994. Overview of problems of risk management of accidents with dangerous chemicals in Europe. European Journal of Operational Research , 75(3): 488-498.); Assimizele et al. (20204 ASSIMIZELE B, BYE RT ET AL. 2020. Minimizing the Environmental Risk from Oil Tanker Grounding Accidents in the High North. American Journal of Operations Research, 10(03): 83.); Mekker et al. (201833 MEKKER M, LI H, COX E, BULLOCK D ET AL. 2018. Dashboards for Monitoring Congestion and Crashes in Interstate Work Zones. American Journal of Operations Research , 9(1): 15-30.).

Traffic accident rates with deaths in Brazil are only surpassed by India, China, the United States and Russia (World Health Organization, 201844 WORLD HEALTH ORGANIZATION. 2018. Global status report on road safety 2018. World Health Organization.) where between 1980 and 2011 nearly one million people died from traffic accidents in the country, despite new laws being introduced and implemented in 1998 (Brazilian Traffic Code or CTB) establishing conduct rules, infractions and penalties for drivers and in 2008 with some changes to CTB establishing stricter penalties for drunk drivers (Abreu et al., 20181 ABREU DROM, SOUZA EM & MATHIAS TAF. 2018. Impacto do Código de Trânsito Brasileiro e da Lei Seca na mortalidade por acidentes de trânsito. Cadernos de Saúde Pública, 34: e00122117.).

It is important to point out that road transport in Brazil is the country’s main logistics system with a network of 1,720,700 kilometers (Boletim Estatistico do CNT, 2018) of national roads and highways (the fourth largest in the world, CIA World Factbook, Brazil), where 61.1% of all cargo handled in Brazil circulates (Boletim Estatistico do CNT, 2018). This highway system, often containing old highways, with poorly drawn roads, simple and poorly signposted roads, is the main means of transporting cargo and passengers in the country’s traffic. This kind of transport system has been used since the beginning of the republic, when governments began to prioritize road transport over rail and river transport. Under the epidemiological classification, traffic accidents have been a highlight in external causes of mortality (ICD-10 codes WHO V01 to Y98, 1993), where in the period from 1977 to 1986 the traffic accident mortality rate in Brazil went from 16 to 22/100 thousand leading to a 38% increase (Barros et al., 20038 BARROS AJ, AMARAL RL, OLIVEIRA MSB, LIMA SC & GONÇALVES EV. 2003. Acidentes de trânsito com vítimas: sub-registro, caracterização e letalidade. Cadernos de Saúde Pública , 19: 979-986.).

2 METHODOLOGY

This study considered a database related to the victims of road accidents (victims of land transport accidents ICD-10 headings V01 to V89, World Health Organization, 200443 WORLD HEALTH ORGANIZATION. 2004. International statistical classification of diseases and related health problems. vol. 1. World Health Organization.) reported by the federal police (PF) of Brazil regarding all federal highways in the period ranging from January 1, 2018 to April 30, 2019 covering all states of the federation (https://www.prf.gov.br/portal/dados-abertos/acidents) where the federal police reported for each victim the type of injury (unharmed, minor injury, serious injury and death) and some important factors such as cause of the accident, type of accident, phase of the day, weather condition, type of track, road layout, age of the victim, gender of the victim and type of vehicle. This information is described in the accident reports prepared by the road police officers for each road accident. In this paper the data are grouped in the form of monthly compositional data (observed proportion of uninjured, lightly injured, severely injured and injured who died at the accident site) for each federative unit in Brazil. The data set is presented in Table A1 in an appendix at the end of the manuscript. Table 2 shows the total of casualties in each class (unharmed, mild, severe, death) from January 1, 2018 to April 30, 2019 for all units of the federation. Figure 1 shows the box-plots of each class (unharmed, mild injury, severe injury and death) considering all federative units of the Brazil federation. Figure 2 shows the time series for the proportions %unharmed, %mild, %severe and %death. Figure 3 presents time series plots of the proportions observed for the 27 federative units in Brazil.

Table 2
Total count of ocurrences in each class (unharmed, mild, severe, death) from January 1, 2018 to April 30, 2019 for all federation units (FU).

Figure 1
Box plots for the proportions (unharmed, mild, severe, death) by each federative unit.

Figure 2
Time series for %unharmed, %mild, %severe, death).

Figure 3
Compositional proportions (unharmed, mild, severe, death) by federative unit.

From the box-plots of Figure 1, it is possible to see that some provinces as São Paulo state (SP) presents greater proportion of unharmed victims of the road accidents while other states as Minas Gerais (MG) presents smaller proportion of unharmed victims when compared to other federative units of Brazil. Also it is observed that the proportion of injury severity is smaller for São Paulo (SP) state in comparison to the other federative units of Brazil while for some northeast federative units as Alagoas (AL), Maranhão (MA), Sergipe (SE) and Rio Grande do Norte (RN) there are large proportions of injury severity in comparison to the other federative units of Brazil.

2.1 Modeling of Compositional Data

Compositional data are vectors of proportions specifying G fractions of a total. Denoting x=(x1, x2, . . . , xG) to be a compositional vector, we must have xi>0, for i=1, . . . , G and x1+x2+. . .+xG=1. Compositional data often result when raw data is normalized or when data is obtained as proportions of a certain heterogeneous amount. These conditions are usual in geology, economy and biology. Standard existing methods for analyzing multivariate data under the usual assumption of normal multivariate distribution (see, for example, Johnson et al., 200220 JOHNSON RA, WICHERN DW ET AL. 2002. Applied multivariate statistical analysis. vol. 5. Prentice hall Upper Saddle River, NJ.) are not appropriate to analyze compositional data, since we have compositional constraints. Different modeling approaches are considered to analyze compositional data. A first model considered to analyze compositional data was based on the Dirichlet distribution, but this model requires that the correlation structure should be totally negative, an unobserved fact for compositional data where some correlations are positive (see, for example, Aitchison, 19822 AITCHISON J. 1982. The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2): 139-160.; Atchison & Shen, 19805 ATCHISON J & SHEN SM. 1980. Logistic-normal distributions: Some properties and uses. Biometrika, 67(2): 261-272.).

Atchison & Shen (19805 ATCHISON J & SHEN SM. 1980. Logistic-normal distributions: Some properties and uses. Biometrika, 67(2): 261-272.) introduced the lognormal distribution to analyze compositional data, transforming the vector of G components x into a vector y defined in the real coordinate space R G 1 considering an additive ratio log (ALR) function. Rayens & Srinivasan (199135 RAYENS WS & SRINIVASAN C. 1991. Box-Cox transformations in the analysis of compositional data. Journal of Chemometrics, 5(3): 227-239.) extended the ALR transformation considering Box & Cox (196411 BOX GE & COX DR. 1964. An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) , 26(2): 211-243.) transformations as a generalization of the log-ratio function. Another possibility is to consider the isometric log-ratio (ILR) transformation (Egozcue et al., 200313 EGOZCUE JJ, PAWLOWSKY-GLAHN V, MATEU-FIGUERAS G & BARCELO-VIDAL C. 2003. Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279-300.; Martín Fernández et al., 201530 MARTÍN FERNÁNDEZ JA, DAUNIS I ESTADELLA J & MATEU I FIGUERAS G. 2015. On the interpretation of differences between groups for compositional data. SORT: statistics and operations research transactions, 2015, vol. 39, núm. 2, p. 231-252, .), but the inverse transformation to get the proportions in each class are more complex in the computational work and the obtained results are very similar to the obtained results assuming the ALR transformation (see for example, Martinez et al., 202031 MARTINEZ EZ, ACHCAR JA, ARAGON DC & BRUNHEROTTI MA. 2020. A Bayesian analysis for pseudo-compositional data with spatial structure. Statistical Methods in Medical Research, 29(5): 1386-1402.). Usually we have great difficulty to get classical inference results for these models, especially in the presence of a covariate vector. Alternatively, the use of Bayesian methods (see, for example, Gelman et al., 201315 GELMAN A, CARLIN JB, STERN HS, DUNSON DB, VEHTARI A & RUBIN DB. 2013. Bayesian data analysis. CRC press.) is a good alternative to analyze compositional data (see, for example, Iyengar & Dey, 199618 IYENGAR M & DEY DK. 1996. Bayesian analysis of compositional data. Department of Statistics, University of Connecticut, Storrs, CT, pp. 06269-3120., 199819 IYENGAR M & DEY DK. 1998. Box-Cox transformations in Bayesian analysis of compositional data. Environmetrics: The official journal of the International Environmetrics Society, 9(6): 657-671.; Tjelmeland & Lund, 200341 TJELMELAND H & LUND KV. 2003. Bayesian modelling of spatial compositional data. Journal of Applied Statistics, 30(1): 87-100.; Shimizu et al., 201536 SHIMIZU TK, LOUZADA F, SUZUKI AK & EHLERS RS. 2015. Modeling Compositional Regression with uncorrelated and correlated errors: a Bayesian approach. arXiv preprint arXiv:1507.00225, .), especially considering Markov Chain Monte Carlo (MCMC) methods (see, for example, Gelfand & Smith, 199014 GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American statistical association, 85(410): 398-409.; Smith & Roberts, 199338 SMITH AF & ROBERTS GO. 1993. Bayesian computation via the Gibbs sampler and related Markov Chain Monte Carlo methods. Journal of the Royal Statistical Society. Series B (Methodological), pp. 3-23.).

Thus, the compositional data introduced in Table A.1 are denoted by x1i=% unharmed, x2i=% mild injuries, x3i=% severe injuries and x4i=% deaths. Let us assume a model with additive ratio log (ALR) transformation given by y1i=log(x2i/x1i), y2i=log(x3i/x1i) and y3i=log(x4i/x1i) given by,

y 1 i = g ( β 1 , z i ) + w i + ε j i y 2 i = g ( β 2 , z i ) + w i + ε j i y 3 i = g ( β 3 , z i ) + w i + ε j i (1)

where β 1, β 2 and β 3 are vectors of regression parameters, z i is a covariate vector associated to the i th observation i=1,2,,432, w i is a random effect (latent unobserved variable) that captures the dependency between the proportions for each province/month and εji are errors (non-observed variables) assumed to be independent random variables with normal distributions N(0,σj2). Different distributions could be assumed for the random effects w i ; in study, it is assumed a normal distribution N(0,σw2).

For a hierarchical Bayesian analysis of the model, it is assumed normal prior distributions for the regression parameters with known hyperparameter values. For the second stage of the hierarchical Bayesian analysis, it is assumed a gamma prior distribution for the inverse of the variance σw2 of the latent variable w i , that is,

τ w ~ G ( a w , b w ) (2)

where G(a, b) denotes a gamma distribution with mean a/b and variance a/b2; τj=1/σw2; aw and b w are known hyperparameters. Further, it is assumed prior independence among the parameters.

Posterior summaries of interest for model (1) are obtained using simulated samples of the joint posterior distribution for the model parameters using MCMC methods. The simulation algorithm to generate samples of the joint posterior distribution for the model parameters is obtained from the complete conditional posterior distributions for each parameter required in the MCMC simulation algorithm. A great simplification in the simulation procedure is to use some existing Bayesian simulation software. One such software is the Openbugs software (see, for example, Lunn et al., 200925 LUNN D, SPIEGELHALTER D, THOMAS A & BEST N. 2009. The BUGS project: Evolution, critique and future directions. Statistics in medicine, 28(25): 3049-3067.), where it is only needed to specify the joint distribution for the observations and the prior distributions for the parameters of the assumed model.

Associated with the compositional data, there are some covariates such as month, year and region of Brazil where the accident occurred. In addition to these covariates, other independent variables of interest may also be associated with the compositional responses, such as road condition, road layout, weather condition, and accident time. An important covariate in the occurrence of road accidents is given by the condition of the pavement. Table 3 presents road pavement conditions considering samples of a few kilometers of highways in each federal unit of Brazil presented in the site related to the year 2018 “CNT 2018 Highways Survey”.

Table 3
Condition of the pavement - total length evaluated.

For the analysis of the compositional data given in Table A.1, it is assumed the following covariates: month, year, percentage of pavement good, fair, bad, very bad (the optimum percentage is not considered due to restriction %optimum + %good + regular% +%bad + %very bad = 1) and the dummy variables related to the northeast (1 for NE and 0 otherwise), midwest (1 for CO and 0 otherwise), southeast (1 for SE and 0 otherwise) and south (1 for S and 0 otherwise) regions where the northern region (N) is considered as a reference.

In the data analysis, it is first assumed a regression model with compositional data (1) not considering the presence of the latent factor W denoted as “model 1”, that is, assuming independence among the responses in the additive log-ratio (ALR) transformation y1i=log(x2i/x1i), y2i=log(x3i/x1i) and y3i=log(x4i/x1i) where x1i=% unharmed, x2i=% minor injuries, x3i=% severe injuries and x4i=% deaths. Thus, it is assumed the linear regression models:

y 1 i = g ( β 1 , z i ) + w i + ε j i y 2 i = g ( β 2 , z i ) + w i + ε j i y 3 i = g ( β 3 , z i ) + w i + ε j i (3)

where,

g ( β 1 , z i ) = β 11 + β 12 m o n t h i + β 13 y e a r i + β 14 % g o o d . p a v i + β 15 % r e g u l a r . p a v i + β 16 % b a d . p a v i + β 17 % l o u s y . p a v i + β 18 r e g i o n . N E i + β 19 r e g i o n . C O i + β 110 r e g i o n . S E i + β 111 r e g i o n . S i , g ( β 2 , z i ) = β 21 + β 22 m o n t h i + β 23 y e a r i + β 24 % g o o d . p a v i + β 25 % r e g u l a r . p a v i + β 26 % b a d . p a v i + β 27 % l o u s y . p a v i + β 28 r e g i o n . N E i + β 29 r e g i o n . C O i + β 210 r e g i o n . S E i + β 211 r e g i o n . S i , g ( β 3 , z i ) = β 31 + β 32 m o n t h i + β 33 y e a r i + β 34 % g o o d . p a v i + β 35 % r e g u l a r . p a v i + β 36 % b a d . p a v i + β 37 % l o u s y . p a v i + β 38 r e g i o n . N E i + β 39 r e g i o n . C O i + β 310 r e g i o n . S E i + β 311 r e g i o n . S i (4)

and εji are independent assumed errors with normal distributions N(0,σj2), j=1,2,3.

From the ALR transformations assuming the real proportions p 1i , p 2i , p 3i and p 4i where, p1i+p2i+p3i+p4i=1, we have, y1i=log(x2i/x1i), y2i=log(x3i/x1i) and y3i=log(x4i/x1i), and the inverse estimated proportions in each class are easily obtained from the expressions,

p 1 i ^ = 1 / [ 1 + e x p ( y 1 i ^ ) + e x p ( y 2 i ^ ) + e x p ( y 3 i ^ ) ] , p 2 i ^ = e x p ( y 1 i ^ ) / [ 1 + e x p ( y 1 i ^ ) + e x p ( y 2 i ^ ) + e x p ( y 3 i ^ ) ] , p 3 i ^ = e x p ( y 2 i ^ ) / [ 1 + e x p ( y 1 i ^ ) + e x p ( y 2 i ^ ) + e x p ( y 3 i ^ ) ] , p 4 i ^ = e x p ( y 3 i ^ ) / [ 1 + e x p ( y 1 i ^ ) + e x p ( y 2 i ^ ) + e x p ( y 3 i ^ ) ] (5)

where y1i^, y2i^, y3i^ and y4i^ are predicted values based on the estimated model.

Assuming normal independent prior distributions N(0,1) for all regression parameters and gamma distributions G(1,1) for the variances of the errors ε1i , ε2i and ε3i , Table 4 shows the posterior summaries of interest (Monte Carlo estimators given by the posterior parameter means, posterior standard deviations of the parameters and 95% credibility intervals) based on 1000 simulated Gibbs samples (every 100th simulated sample among 100,000 generated Gibbs samples to get an approximately uncorrelated sample) of the joint posterior distribution for all model parameters obtained using the Openbugs software and considering a burn-in sample of size 11,000 discarded to eliminate the effect of the initial parameter values needed for the MCMC algorithm. Convergence of the MCMC simulated samples was monitored by traceplots of the generated Gibbs samples (see Gelman et al., 201315 GELMAN A, CARLIN JB, STERN HS, DUNSON DB, VEHTARI A & RUBIN DB. 2013. Bayesian data analysis. CRC press.)

Table 4
Posterior summaries - “model 1”.

From the results presented in Table 4, it is observed that the significative effects (zero not included in the 95% credibility intervals) are:

  • Response y2=log(x3/x1) where x1=% unharmed and x3=% serious injury: poor pavement (regression parameter β27 is estimated by a negative value) and NE (northeast) region where regression parameter β28 is estimated by a positive value indicating that the difference between x3=% of serious injuries and x1=% unharmed increases in the NE region when compared to the N region (north considered as reference).

  • Response y3=log(x4/x1) where x1=% unharmed and x4=% deaths: covariate year (regression parameter β33 is estimated by a negative value indicating a decrease in the death/unharmed difference in the year 2019); NE (northeastern) region where the regression parameter β38 is estimated by a positive value indicating that the difference between x3=% deaths and x1=% unharmed increases in the NE region as compared to the N region (north considered as reference); CO region (midwest) where the regression parameter β39 is estimated by a positive value indicating that the difference between x3=% deaths and x1=% unharmed increases in the CO region when compared to the N region (north considered as reference); SE region (southeast) where the regression parameter β310 is estimated by a positive value indicating that the difference between x3=% deaths and x1=% unharmed increases in the SE region when compared to the N region (north considered as reference); and region S (south) where the regression parameter β311 is also estimated by a positive value indicating that the difference between x3=% deaths and x1=% unharmed increases in region S when compared with region N (north considered as reference).

Now assuming a regression model with compositional data defined by (1) and (4) in the presence of the latent factor W denoted by “model 2”, that is, assuming dependence between the responses assuming a gamma distribution G(1,1) for the variance σw2 of the random factor w i with a normal distribution N(0,σw2) included in model (4), we have in Table 5, the posterior summaries of interest assuming the MCMC simulation method based on 1000 simulated Gibbs samples (every 400th simulated samples among 400,000 generated Gibbs samples to get an approximately uncorrelated sample) of the joint posterior distribution for all model parameters obtained using Openbugs software and considering a burn-in sample of size 111,000 discarded to eliminate the effect of the initial parameter values needed for the MCMC algorithm. Convergence of the MCMC simulated samples was monitored by traceplots of the generated Gibbs samples.

Table 5
Posterior summaries - “model 2”.

From the results presented in Table 5, it is observed that the significative effects (zero not included in the 95% credibility intervals) are the same as those obtained using “model 1”.

For the discrimination of the best model, it is used the Deviance Information Criterion (DIC). The DIC criterion (Spiegelhalter et al., 201439 SPIEGELHALTER DJ, BEST NG, CARLIN BP & VAN DER LINDE A. 2014. The deviance information criterion: 12 years on. Journal of the Royal Statistical Society : Series B (Statistical Methodology), 76(3): 485-493.) is based on the posterior average of the deviance. Deviance is defined by,

D ( θ ) = - 2 log L ( θ ) + C (6)

where θ is a vector of unknown parameters of the model; L(θ) is the likelihood function and C is a constant (not always known) when comparing two models. The DIC criterion is defined by,

D I C = D ( θ ^ ) + 2 p D (7)

where D(θ^) is the posterior averaged deviation θ^=E(θ^/y) and p D is the number of model parameters, given by pD=D¯-D(θ^) where D¯=E(D(θ/y) is the posterior mean of the deviation that measures the quality of data fit for each model.

Table 6 shows the DIC values obtained from the generated Gibbs samples using the Openbugs software for both models considered in the data analysis.

Table 6
DIC estimates for model 1 and model 2

From the results of Table 4, it can be observed that the “model 2” is better fitted by the data. Assuming “model 2”, the estimated proportions for the four classes given by (5) and the observed proportions are presented in Figure 4. From the plots of Figure 4, it is observed good fit of model 2 to the compositional data associated to accident victims in Brazilian federal roads.

Figure 4
Estimated and observed proportions (unharmed, mild, severe, deaths).

3 DISCUSSION OF THE RESULTS AND CONCLUDING REMARKS

From the obtained results usig ALR compositional models it is possible to get important conclusions on the study. Since the significative covariates affecting the responses y2i=log(x3i/x1i) and <math><msub><mi>y</mi><mrow><mn>3</mn><mi>i</mi></mrow></msub><mo>=</mo><mi>l</mi><mi>o</mi><mi>g</mi><mo>(</mo><msub><mi>x</mi><mrow><mn>4</mn><mi>i</mi></mrow></msub><mo>/</mo><msub><mi>x</mi><mrow><mn>1</mn><mi>i</mi></mrow></msub><mo>)</mo></math>, where x1i=% unharmed, x2i=% minor injuries, x3i=% severe injuries and x4i=% deaths are given by poor pavement, NE region and year, Figures 5, 6, 7 and 8 show the scatter plots associated to each response and covariate from where it is possible to get important interpretations for the compositional multivariate dataset.

Figure 5
Graphs of y2=log (severe/unharmed) versus %lousy pavement and NE region.

Figure 6
Graphs of y3=log (death/unharmed) versus year, NE region, CO region, SE region and S region.

Figure 7
Graphs of %severe injury and %unharmed injury versus very bad pavement and NE region.

Figure 8
Graphs of %death and %unharmed injury versus year, NE region, CO region, SE region and S region.

From the graphs of Figure 5, it is possible to observe that although there is great variability in the response y 2i there is a slight decreasing in the response with %lousy pavement (pavement in very poor condition) and an increasing in the response y 2i in the NE region when compared to the other regions of Brazil.

From the graphs of Figure 6, it is possible to see an increasing in the response y 3i in the year 2019 when compared to the year 2018; an increasing in the response y 3i in the NE region when compared to the other regions of Brazil; a decreasing in the response y 3i in the CO region when compared to the other regions of Brazil; an apparently decreasing in the response y 3i in the SE region when compared to the other regions of Brazil and an apparently decreasing in the response y 3i in the S region when compared to the other regions of Brazil.

From the graphs of Figure 7, it is possible to see an increasing of %severe injuries in the NE region when compared to the other regions of Brazil and a decreasing of %unharmed persons in the NE region; in relation to the factor very bad (lousy) pavement, it is difficult to see the effect in %unharmed and %severe injuries.

From the graphs of Figure 8, it is possible to observe that although there is great variability in the responses %deaths and %unharmed, we see a small increasing of %deaths related to the year 2019 when compared to the year 2018; similarly apparently there is an increasing of %deaths in the NE region when compared to the other regions of Brazil; a decreasing of %deaths in the SE and S regions.

In summary, from the obtained results, it is concluded that the rates of serious accidents and deaths are affected by some covariates as the regions of Brazil (especially the NE region where the rates for accidents with severe injuries and deaths are higher than the rates for the other regions of Brazil), years and some sligh effect of pavement conditions of the roads, which could be important for the road managers to take decisions to improve the road conditions in Brazil.

This is an important result which could help in future decreasing of the high rates of severe injuries and deaths in the Brazilian federal roads.

As concluding remarks, it is possible to point out that the use of existing compositional Bayesian models could be of great interest in the data analysis of road accidents as seen in this study. It is important to point out that other prior distributions could be considered for the parameters of the model possibly incorporating with prior opinions of engineer experts in road traffic. The use of MCMC methods to get the posterior summaries of interest using free existing simulation softwares like the OpenBugs software could be a great option in the data analysis under a hierarchical Bayesian data analysis which only requires the specification of the likelihood function and the prior distributions for the parameters of the model. It is important to point out that other dependence structures could be assumed for the ALR transformed data, like a multivariate normal distribution for the errors in the compositional model (see for example, Shimizu et al., 201536 SHIMIZU TK, LOUZADA F, SUZUKI AK & EHLERS RS. 2015. Modeling Compositional Regression with uncorrelated and correlated errors: a Bayesian approach. arXiv preprint arXiv:1507.00225, .).

In a future work the results of this study could be extended to the presence of other covariates as weather conditions, type of road (double lane and single lane), roads with and without tolls, period of day, speed of the vehicle at the moment of the accident and many other possible covariates that could affect the responses given by a proportion vector (x1i=% unharmed, x2i=% mild injuries , x3i=% severe injuries and x4i=%deaths).

Acknowledgments

The authors are very grateful for the reviewers’ comments that led to a great improvement of the manuscript.

References

  • 1
    ABREU DROM, SOUZA EM & MATHIAS TAF. 2018. Impacto do Código de Trânsito Brasileiro e da Lei Seca na mortalidade por acidentes de trânsito. Cadernos de Saúde Pública, 34: e00122117.
  • 2
    AITCHISON J. 1982. The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2): 139-160.
  • 3
    ANDRADE SSCA & MELLO-JORGE MHP. 2016. Mortality and potential years of life lost by road traffic injuries in Brazil, 2013. Revista de saude publica, 50: 59.
  • 4
    ASSIMIZELE B, BYE RT ET AL. 2020. Minimizing the Environmental Risk from Oil Tanker Grounding Accidents in the High North. American Journal of Operations Research, 10(03): 83.
  • 5
    ATCHISON J & SHEN SM. 1980. Logistic-normal distributions: Some properties and uses. Biometrika, 67(2): 261-272.
  • 6
    BACCHIERI G & BARROS AJ. 2011. Acidentes de trânsito no Brasil de 1998 a 2010: muitas mudanças e poucos resultados. Revista de Saúde Pública, 45(5): 949-963.
  • 7
    BAHADORIMONFARED A, SOORI H, MEHRABI Y, DELPISHEH A, ESMAILI A, SALEHI M & BAKHTIYARI M. 2013. Trends of Fatal Road Traffic Injuries in Iran (2004-2011).
  • 8
    BARROS AJ, AMARAL RL, OLIVEIRA MSB, LIMA SC & GONÇALVES EV. 2003. Acidentes de trânsito com vítimas: sub-registro, caracterização e letalidade. Cadernos de Saúde Pública , 19: 979-986.
  • 9
    BAYKAL-GÜRSOY M, XIAO W & OZBAY K. 2009. Modeling traffic flow interrupted by incidents. European Journal of Operational Research, 195(1): 127-138.
  • 10
    BHALLA K, SHOTTEN M, COHEN A, BRAUER M, SHAHRAZ S, BURNETT R, LEACH-KEMON K, FREEDMAN G & MURRAY C. 2014. Transport for health: the global burden of disease from motorized road transport.
  • 11
    BOX GE & COX DR. 1964. An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) , 26(2): 211-243.
  • 12
    CASTRO ARAGÓN FR & LEAL JE. 2003. Alocação de fluxos de passageiros em uma rede de transporte público de grande porte formulado como um problema de inequações variacionais. Pesquisa Operacional, 23(2): 235-264.
  • 13
    EGOZCUE JJ, PAWLOWSKY-GLAHN V, MATEU-FIGUERAS G & BARCELO-VIDAL C. 2003. Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279-300.
  • 14
    GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American statistical association, 85(410): 398-409.
  • 15
    GELMAN A, CARLIN JB, STERN HS, DUNSON DB, VEHTARI A & RUBIN DB. 2013. Bayesian data analysis. CRC press.
  • 16
    HAAGSMA JA, GRAETZ N, BOLLIGER I, NAGHAVI M, HIGASHI H, MULLANY EC, ABERA SF, ABRAHAM JP, ADOFO K, ALSHARIF U ET AL. 2016. The global burden of injury: incidence, mortality, disability-adjusted life years and time trends from the Global Burden of Disease study 2013. Injury prevention, 22(1): 3-18.
  • 17
    HAASTRUP P. 1994. Overview of problems of risk management of accidents with dangerous chemicals in Europe. European Journal of Operational Research , 75(3): 488-498.
  • 18
    IYENGAR M & DEY DK. 1996. Bayesian analysis of compositional data. Department of Statistics, University of Connecticut, Storrs, CT, pp. 06269-3120.
  • 19
    IYENGAR M & DEY DK. 1998. Box-Cox transformations in Bayesian analysis of compositional data. Environmetrics: The official journal of the International Environmetrics Society, 9(6): 657-671.
  • 20
    JOHNSON RA, WICHERN DW ET AL. 2002. Applied multivariate statistical analysis. vol. 5. Prentice hall Upper Saddle River, NJ.
  • 21
    JORGE M, KOIZUMI MS, TUONO VL ET AL. 2008. Acidentes de trânsito no Brasil: a situação nas capitais.
  • 22
    JORGE M, KOIZUMI MS ET AL. 2009. Acidentes de trânsito causando vítimas: possível reflexo da lei seca nas internações hospitalares. Revista ABRAMET, 27(2): 16-25.
  • 23
    JORGE M, LATORRE MR ET AL. 1994. Acidentes de trânsito no Brasil: dados e tendências. Cadernos de Saúde Pública , 10: S19-S44.
  • 24
    KLEIN CH. 1994. Mortes no trânsito do Rio de Janeiro, Brasil. Cadernos de Saúde Pública , 10: S168-S176.
  • 25
    LUNN D, SPIEGELHALTER D, THOMAS A & BEST N. 2009. The BUGS project: Evolution, critique and future directions. Statistics in medicine, 28(25): 3049-3067.
  • 26
    LYONS RA, WARD H, BRUNT H, MACEY S, THOREAU R, BODGER O & WOODFORD M. 2008. Using multiple datasets to understand trends in serious road traffic casualties. Accident Analysis & Prevention, 40(4): 1406-1410.
  • 27
    MALTA DC, SILVA MMAD & BARBOSA J. 2012. Violências e acidentes, um desafio ao Sistema Único de Saúde. Ciência & Saúde Coletiva, 17(9): 2220-2220.
  • 28
    MARÍN L & QUEIROZ MS. 2000. A atualidade dos acidentes de trânsito na era da velocidade: uma visão geral. Cadernos de Saúde Pública , 16: 7-21.
  • 29
    MARÍN-LEÓN L, BELON AP, BARROS MBDA, ALMEIDA SDDM & RESTITUTTI MC. 2012. Tendência dos acidentes de trânsito em Campinas, São Paulo, Brasil: importância crescente dos motociclistas. Cadernos de Saúde Pública , 28(1): 39-51.
  • 30
    MARTÍN FERNÁNDEZ JA, DAUNIS I ESTADELLA J & MATEU I FIGUERAS G. 2015. On the interpretation of differences between groups for compositional data. SORT: statistics and operations research transactions, 2015, vol. 39, núm. 2, p. 231-252, .
  • 31
    MARTINEZ EZ, ACHCAR JA, ARAGON DC & BRUNHEROTTI MA. 2020. A Bayesian analysis for pseudo-compositional data with spatial structure. Statistical Methods in Medical Research, 29(5): 1386-1402.
  • 32
    MARTÍNEZ F, BALDOQUÍN MG & MAUTTONE A. 2017. And solution method to a simultaneous route design and frequency setting problem for a bus rapid transit system in Colombia. Pesquisa Operacional , 37(2): 403-434.
  • 33
    MEKKER M, LI H, COX E, BULLOCK D ET AL. 2018. Dashboards for Monitoring Congestion and Crashes in Interstate Work Zones. American Journal of Operations Research , 9(1): 15-30.
  • 34
    NOVAES AG. 2001. Rapid-transit efficiency analysis with the assurance-region DEA method. Pesquisa Operacional , 21(2): 179-197.
  • 35
    RAYENS WS & SRINIVASAN C. 1991. Box-Cox transformations in the analysis of compositional data. Journal of Chemometrics, 5(3): 227-239.
  • 36
    SHIMIZU TK, LOUZADA F, SUZUKI AK & EHLERS RS. 2015. Modeling Compositional Regression with uncorrelated and correlated errors: a Bayesian approach. arXiv preprint arXiv:1507.00225, .
  • 37
    SILVA S & ANDRADE S. 1996. Acidentes de trânsito: Problema prioritário de saúde. A Construção do SUS a partir do Município, pp. 95-99.
  • 38
    SMITH AF & ROBERTS GO. 1993. Bayesian computation via the Gibbs sampler and related Markov Chain Monte Carlo methods. Journal of the Royal Statistical Society. Series B (Methodological), pp. 3-23.
  • 39
    SPIEGELHALTER DJ, BEST NG, CARLIN BP & VAN DER LINDE A. 2014. The deviance information criterion: 12 years on. Journal of the Royal Statistical Society : Series B (Statistical Methodology), 76(3): 485-493.
  • 40
    SZWED P, VAN DORP JR, MERRICK JR, MAZZUCHI TA & SINGH A. 2006. A Bayesian paired comparison approach for relative accident probability assessment with covariate information. European Journal of Operational Research , 169(1): 157-177.
  • 41
    TJELMELAND H & LUND KV. 2003. Bayesian modelling of spatial compositional data. Journal of Applied Statistics, 30(1): 87-100.
  • 42
    WAISELFISZ JJ. 2013. Mapa da violência 2013: acidentes de trânsito e motocicletas. Rio de Janeiro.
  • 43
    WORLD HEALTH ORGANIZATION. 2004. International statistical classification of diseases and related health problems. vol. 1. World Health Organization.
  • 44
    WORLD HEALTH ORGANIZATION. 2018. Global status report on road safety 2018. World Health Organization.
  • 45
    WORLD HEALTH ORGANIZATION ET AL. 2018. Noncommunicable diseases country profiles 2018. World Health Organization.

Publication Dates

  • Publication in this collection
    07 Dec 2020
  • Date of issue
    2020

History

  • Received
    28 Oct 2019
  • Accepted
    19 Aug 2020
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br