Dynamic Regression Model for Evaluating the Association Between Atmospheric Conditions and Deaths due to Respiratory Diseases in São Paulo , Brazil

The article reports the modeling of mortality due to respiratory diseases emanating from atmospheric conditions, capturing significant associations and verifying the ability of stochastic modeling to predict deaths arising from the relationship between weather conditions and air pollution. The statistical methods used in the analysis were cross-correlation and pre-whitening, in addition to dynamic regression modeling combining the dynamics of time series and the effect of explanatory variables. The results show there are significant associations between mortality and sulfur dioxide, air temperature, atmospheric pressure, relative humidity, and autoregressive structure. The cross-correlations captured significant lags between atmospheric variables and deaths, of two months for SO2 and relative humidity, eleven months for PM10, seven months for O3, and eight months for air temperature and the cross-correlation without lag with NO2. With CO variables, precipitation and atmospheric pressure, cross-correlations were not detected. Stochastic modeling showed that deaths due to respiratory diseases can be predicted from the combination of meteorological and air pollution variables, especially considering the existing trend and seasonality.


Introduction
It is known that weather or atmospheric conditions are just a few among many factors that influence human health.According to the World Health Organization (WHO, 2013), good human health depends on the continued stability of ecological and physical systems of the biosphere.This stability is increasingly threatened, with consequent vulnerability to health, as the human species becomes more urbanized and detached from natural systems.
Air pollution has been linked to increased demand for health services and more deaths due to various factors in large urban centers, combined with other factors such as sanitation and social inequalities that deteriorate the quality of life, mainly in developing countries (Cakmak et al., 2006;Nastos and Matzarakis, 2006).
The uninterrupted quest for comfort after the industrial revolution, according to estimates by the IPCC (2013), is the main cause of global warming and therefore of climate variability changes.The air quality in large urban areas is on the whole becoming worse, and future scenarios associated with climate changes include greater instability related to air pollution.So, while globally everyone will feel the impacts of climate change, those living in great urban centers will be the first ones, especially due to high levels of air pollution (Confalonieri et al., 2009).Sicard et al. (2011) state that the adverse effects to human health related to air pollution are mainly due to the sum of several pollutants, which are not always easy to quantify.Although the effects are not well defined in exposed individuals, some acute effects have been studied in previously healthy individuals and in individuals with a chronic cardiovascular or respiratory disease (Esquivel et al., 2011).
In general the most affected groups are children and elderly individuals, due to the development of the organism during childhood and the inherent problems of aging.Kinney and Ozkaynak (1991), Saldiva et al. (1995) Burkart and Endlicher (1999), Filleul et al. (2004), Ostro et al. (2006), Sujaritpong et al. (2014).Respiratory diseases are one of the largest health problems worldwide.Hundreds of millions of people of all ages suffer from respiratory diseases and allergies, with over 500 million of them living in developing countries.
In particular, the prevalence of respiratory diseases is increasing in the city of São Paulo (Ferrer et al., 2010).This increase affects the quality of life and can cause problems not only in affected individuals but their families as well (WHO, 2013).
The World Health Organization and the World Bank estimate that four million people suffering from respiratory diseases may have died prematurely in 2005 and the projections are for substantial increase in the number of deaths in the future.These forecasts are associated with climate changes that increase vulnerability of people (IPCC, 2013).In Sao Paulo, for example, air pollution doubles the risk of children developing asthma (Pereira et al., 1998;Gouveia et al., 2004).
According to Saldiva et al. (2008) the effect of air pollution is both immediate (e.g., difficulty breathing and eye irrigation) and cumulative on the body and can accelerate the development of diseases.The air constantly polluted aggravates the development of certain respiratory diseases, resulting in a group of symptoms affecting various organs such as the nose and throat, increasing cases of asthma and sinusitis (Lelieveld et al., 2013).
Given the evidence that many health outcomes are sensitive to climate changes, it is inevitable that climate change will affect peoples' health (IPCC, 2013).In the present study we considered the combined effect of weather and air pollution to possibly explain the mortality of elderly individuals due to respiratory diseases.More particularly, we modeled the number of deaths due to respiratory diseases influenced by atmospheric pollutants and meteorological variables from January 2000 to December 2010 in the city of São Paulo.Stochastic modeling via dynamic regression was used, taking into consideration besides the stochastic process, the explanatory variables with their lags to improve the accuracy of the estimates, in order to adjust the current distribution of deaths through known atmospheric variables, allowing extrapolation of the results and checking possible impacts of the weather.The results contribute through a robust statistical method able to relate climate and health variables, as a tool for researchers and public policymakers.

Material and Methods
The data of this research on mortality due to multiple respiratory diseases (bronchitis, pulmonary emphysema and asthma), concentrations of pollutants and meteorological variables from January 2000 to December 2010 in the city of São Paulo were used.
The target population of the study was elderly individuals (60 years or older) living in São Paulo city.Monthly deaths diagnosed as caused by respiratory diseases were obtained from the Mortality Information System.Air pollutant measurements were collected from the telemetry network stations of the CETESB (Sanitation Company of the state of São Paulo).The pollutants included in this study were particulate matter (PM 10 ) (mg/m 3 ), ozone (O 3 ) (mg/m 3 ), sulfur dioxides (SO 2 ) (g/m 3 ), nitrogen (NO 2 ) (mg/m 3 ) and carbon monoxide (CO) (ppm).
The monthly concentrations of air pollutants are calculated daily and collected each hour for the CETESB telemetry network.However, some days didn't obtain measurements, for this reason, we chose to work with monthly averages.To calculate the average monthly was possible to consider 20 days at least every month during the study period.
São Paulo city, capital of the state of São Paulo, is one of the few places in Brazil that has a reliable database.It is the country's largest city in terms of population, with 11.9 million inhabitants according to the Instituto Brasileiro de Estatística e Geografia (2010), living in an area of 1,522.986km 2 .We were careful to select multiple weather stations to cover information of representative areas for the state capital as a whole, with 70% population coverage, according to CETESB (2014).
Data concerning air temperature (°C), relative humidity (%), rainfall (mm), and atmospheric pressure (hPa) were obtained from the weather database for teaching and research of the National Institute of Meteorology, which keeps historical weather data in digital form.
Besides descriptive statistics, the dynamic regression model (DRM) was used, which combines the dynamics of time series and the effect of explanatory variables.It is a mathematical regression model involving time series which includes not only current values of the variable under study but also past values (Pankratz, 1991).Furthermore, it is possible to include the explanatory variables and the response variable with or without lags in the model.Therefore, in DRM the dependent variable is explained by its lagged values and the current and past values of the explanatory variables.So, considering the lagged values of the variable (Y t-1 ) and its predictors (X n, t ) or lagged predictors (X n,t-k ), the following dynamic equation is used: (1) where the subscript t-i indicates the variables and parameters at time t with i lags.The equation includes a stochastic term (e t ) representing the variables not included in the model (residuals) as well as some fluctuations normally distributed and not significant to the model (Pankratz, 1991).
The estimation of the parameters is based on the Ordinary Least Squares method (OLS), since the stochastic error term (e t ) has the appropriate properties of zero mean (E(e t ) = 0), homoscedasticity (Var(e t ) = s2) and absence of correlation (E(e i ;e j ) = 0 if i > j) .
The DRM uses the bottom-up strategy, whereby a simple model is initially formulated and is then improved as new variables are included, until a suitable model is found (Zeileis, 2013).To determine the lag used in the DRM we used autocorrelation functions, cross-correlation and pre-whitening.The autocorrelation function is a measure of how much the value of a random variable is able to influence its neighbors.
The autocorrelation value ranges from -1 to 1, the closer to -1, more negative the correlation is and the closer to 1, the more positive it is.A 0 value means total absence of correlation (Newey, 1987).Assuming a time-dependent random variable X t , with mean m, its autocorrelation R(k) is defined as: where E is the value of the mathematical expectation, k is the time lag and s 2 is the variance of variable X t .
The cross-correlation is a measure of similarity between two signs of two variables in function of a lag applied to one of them, i.e., it reflects how the two processes are correlated.
Here we evaluated whether the cross-correlations between dependent variables, which also showed the presence of a strong seasonal factor.To avoid erroneous interpretations in the detection of lags, we used the pre-whitening technique, which is a process to remove unwanted autocorrelation data prior to the analysis of interest.
From this preliminary analysis, we fitted the dynamic regression model, incorporating variables simultaneously including all those presenting some degree of significance to the outcome studied.Therefore, we included these variables both at time t and with lags depending on this preliminary analysis.Finally, by a stepwise elimination procedure we only kept in the final model those variables that remained significant at a level of 5%.
Regarding the stochastic effect of the outcome, when using time series as here, the possible interdependence of the dependent variable at time t and t + 1 should be incorporated in the model.This is to take into account the probability of occurrence of death at time t + 1 conditional on the number of cases at time t death, as there are time-dependent observations.
From the analysis of the three models it was possible to check which could best portray the deaths observed and show the most consistent prediction.The quality of the model fit was analyzed by comparing the values of AIC (Akaike Information Criterion), which uses the Kullback-Leibler divergence to test whether a model is adequate, showing that the bias is asymptotically given by a number of parameters to be estimated in the model (Akaike, 1974).
We also applied the Kolmogorov-Smirnov (KS), Durbin Watson (DW) and Breusch-Pagan (BP) tests.The KS test is intended to determine whether a sample can be regarded as originating from a population with a certain distribution.In this study we used it to check whether the residuals followed a normal distribution (Marsaglia et al., 2003).DW is used to detect the presence of autocorrelation or residual dependency from a regression analysis.This test is based on the assumption that the errors in the regression model are generated by a first-order autoregressive process (Montgomery et al., 2001).Finally, the homoscedasticity assumption was measured using the Breusch-Pagan test (Tsay, 2005).All statistical techniques mentioned were performed with the aid of the free statistical software R 2.15.0.(R, 2008).

Results
During the study period, there were 23.456 deaths due to multiple causes (bronchitis, pulmonary emphysema and asthma) of elderly individuals (60 years or older) who lived in São Paulo.The months with the highest incidences of deaths were June, July and August, and the largest monthly number -418 notifications -occurred in June 2004 (Fig. 1).
The atmospheric, meteorological and elderly death variables (monthly mean values) are shown in Table 1.The monthly behavior of the variables in the study period indicated that on average the highest concentrations of air pol-lutants, except ozone, occurred from June to August, corresponding to winter in São Paulo.
It is important to highlight that the exposure to air pollution refers to contact with pollutant concentrations that an individual encounters over time (hours, days, months, etc.), and, therefore, the deleterious effects to human health occur both immediately and accumulatively, which may explain the lags in the cross-correlations.
The explanation to model the stochastic effect of variables via dynamic regression models lies in their own biological behavior of the effect of climate variables on the risk of dying due to a determined disease.It requires modeling the dependence between observations in time, since bi-  ologically death is manifested in the individual after a period of time subjected to adverse weather conditions on health.
It was considered that the atmospheric and meteorological variables first place to come after the assessment of deaths.So that the previous cross-correlation analysis provides an indication of which variables and that gap should be considered in the modeling.
Based on this preliminary analysis we have fitted the dynamic regression model, incorporating variables simultaneously including all those presenting some degree of significance to the outcome studied.Therefore, were included in both at time t and with lags depending on this preliminary analysis; and for a procedure of variables selection in the model (stepwise).
The cross-correlations with the aid of pre-whitening between death and the atmospheric variables were used to detect significant lags for inclusion in the DRM (Fig. 2).The significant lags were captured between atmospheric variables and number of deaths, of two months for SO 2 and rela-tive humidity, eleven months for PM 10 , seven months for O 3 , and eight months for air temperature and the cross-correlation without lag with NO 2 .No cross-correlation was detected for the covariables rainfall and atmospheric pressure.
With this knowledge, we studied the crosscorrelation between each of the independent variables and the possible outcome gap between them.The DRM was adjusted with all lags captured in the cross-correlations mentioned earlier and the autoregressive variable (Model 1) when modeling started.Model 1 was able to capture the seasonality of the series but the deaths were underestimated and the prediction of death could not contain the same structural pattern of the observed data.From Model 1, the adjustment of the DRM was performed only with the significant variables.It can be noted that the behavior of the predicted deaths are similar to Model 1, however prediction starts to capture the structure of the series of Model 2 (Fig. 3).
The evaluation tests of the models are presented in Table 2. Satisfying the assumptions of modeling via dy-  Table 3 reports the output of Model 3, indicating synergism between sulfur dioxide (p-value = 0.001), air temperature (p-value = 0.029), atmospheric pressure (p-value = 0.001), and relative humidity (p-value = 0.001), when modeled simultaneously the trend and seasonal functions and autoregressive variables.
Table 4 shows the observed deaths and those expected by the Model 3 for 2011.The model underestimates the number of cases in a few months, but it reproduces the seasonal pattern of deaths, predicting deaths in June, July and August, as in the observed data.Anyway, estimations can be made, despite the model's limitations.

Discussion
The present study showed a significant association between the variables of interest and adequately modeled the deaths explained by the synergism among the dependent variables sulfur dioxide, air temperature, atmospheric pressure and relative humidity.Given the lags captured by the pre-whitening technique, which allowed determination of death response time due to the increase and/or modification of either air pollution or weather condition scenario.These results are in agreement with the literature, which suggests the existence of a relationship between air pollution and mortality due to respiratory diseases.Studies have shown associations between exposure to classic pollutants (particulate matter, ozone, sulfur dioxide) and human health in general (Cakmak et al., 2011;Sicard et al., 2011;Lelieveld et al., 2013).
Table 1 shows that the largest number of deaths occurred in the winter season.This period is characterized by mild temperatures, atmospheric pressure, reduced rainfall and variation of relative humidity, which is a favorable scenario to the emergence and/or aggravation of respiratory diseases in elderly individuals.These findings for São Pau-  lo are consistent with previous studies (Esquivel et al., 2011).
Many authors, such as Bi et al. (2003), Ajdacic-Gross et al. (2007) andHuang et al. (2011), advocate the use of cross-correlation to capture the dependence of lagged monthly observations between meteorological variables and disease.It is important to highlight that the exposure to air pollution refers to contact with pollutant concentrations that an individual encounters over time (hours, days, months, etc.), and therefore the deleterious effects on human health occur both immediately and accumulatively, which can explain the lags in the cross-correlations.
The biological explanation for stochastic modeling by dynamic regression depends on climate feedback effect on the risks of dying due to a determined disease.It requires modeling the dependence between observations in time, since biologically death is manifested in the individual after a period of time subject to weather conditions that are adverse to health.
It was considered that the atmospheric and meteorological variables first place to come after the assessment of deaths.So that the previous cross-correlation analysis provides an indication of which variables and that gap should be considered in the modeling.
It is important to highlight that this study, in addition to capturing the significant associations, shows that it is possible to obtain consistent estimates of deaths due to respiratory diseases through the simultaneous effect and conditioning of environmental and climatic variables.Liu et al. (2012) found seasonal effects of the air pollutant concentrations in Tianjin, China and mortality due to general non-accidental causes in cardiovascular, cardiopulmonary, respiratory and ischemic heart disease subcategories.
Seasonal effects of air pollution were also found in this study for the city of São Paulo.Tianjin and São Paulo are classified as the cities with high levels of air pollution (WHO, 2013).The difference in effects was explained by human activity patterns, which change depending on the season.Liu et al. (2012) also suggested that the difference of the effects of pollutants should be taken into consideration when seeking to model mortality data explained by air pollution, which corroborates the results of Makie et al. (2002); Hagler et al. (2006); Galindo et al. (2008) andViana et al. (2014).Both these studies found that where the pollutants' composition is known, it varies in the spatial domain; which suggests that seasonal patterns should be examined by geographic region.Gamble et al. (2013) argued that the relationship between climate change and the emergence of diseases in elderly individuals serves as a parameter for planning actions in order to improve quality of life for this group.Confalonieri et al. (2009) found that even when meeting accept-able standards, air pollution in the city of São Paulo is harmful to health.
The city of São Paulo, according to the IBGE (2010), is the fourth largest urban agglomeration in the world and the sixth most polluted metropolis (WHO, 2005).In São Paulo air pollution kills more people than other public health problems, such as acquired immune deficiency syndrome and tuberculosis (Saldiva et al., 2008).The association between climate and mortality due to respiratory problems requires special attention, and this study makes an important contribution by describing an appropriate method able to consistently predict deaths due to atmospheric conditions.

Conclusions
The originality of this article is the use for dynamic regression modeling to model deaths, capturing associations not only with atmospheric and climatic conditions at time t, but also incorporating lags.We chose to work with monthly information to check whether the robustness of the model reflects the seasonality of the data.
The synergy and interaction are captured by the statistical significance of the model and supported by the physiological understanding of the effect of climate and pollutants on human health.
The results show that the DRM used here can satisfactorily predict deaths due to respiratory diseases, explained by atmospheric variables, although underestimating the observed deaths.We also verified that the model is able to forecast the seasonal patterns of observed data.The trend and seasonality functions added greater predictive power to the DRM.
The model has several strengths, including good fit.However, there are some limitations, such as the problem of not being able to work with individual risk factors for respiratory problems, such as smoking, allergies, genetic factors and viral infections, as well as natural causes such as deterioration of the respiratory system inherent to aging.In this type of study there are no individual observations, nor is there any determination of the extent to which increased concentration of pollutants in the atmosphere contributed to death.
In conclusion, the results of stochastic modeling showed that deaths can be predicted consistently from the combination of meteorological and air pollution variables, especially seasons.The dynamic regression modeling managed to capture the relationship between atmospheric conditions and mortality data of elderly individuals due to respiratory issues in São Paulo city.
Due to the complexity of this model, this study was limited to capturing only the lags in time.Future studies could expand the analysis by incorporating the effect of interactions between atmospheric and meteorological variables.

Figure 1 -
Figure 1 -Deaths due to Bronchitis, Pulmonary Emphysema and Asthma of elderly individuals who lived in the of São Paulo city, in the period between 2000 and 2010.

Figure 2 -
Figure 2 -Cross Correlations after pre-whitening among deaths and atmospheric variables.
namic regression.Based on the AIC, Model 3 showed the best fit.Several other authors, such as Zanobetti et al. (2000) and Coelho et al. (2010), present similar results.Besides having the best fit, Model 3 also presented advantages over the previous ones, since the addition of the trend and seasonality functions managed to capture the structural pattern of the observed data series.It also had better accuracy in prediction, as shown in Fig. 4. In Fig. 5, it is possible to see the difference between the predictions made by Model 2 (a) and Model 3 (b).The addition of the trend and seasonality functions reduced the uncertainty of forecast in Model 3 (b).

Figure 3 -Figure 4 -
Figure 3 -Observed deaths, predicted and planned for deaths of elderly people from Bronchitis, Pulmonary emphysema and Asthma in Sao Paulo city, model 2.

Table 1 -
Means Monthly of Air Pollutants, Meteorological Variables and Mortality of Elderly by Bronchitis, Pulmonary Emphysema and Asthma in the period 2000-2010 in São Paulo-SP.

Table 3 -
Dynamics Regression Model of deaths due to Bronchitis, Pulmonary Emphysema and Asthma of elderly individuals who lived in São Paulo, between 2000 and 2010 with the explanatory variables with statistically significant, trend and seasonality and autoregressive variable (Model 3).

Table 4 -
Deaths due to Bronchitis, Pulmonary Emphysema and Asthma, observed and predicted (model 3) for2011 of elderly individuals who lived in São Paulo.