## Services on Demand

## Journal

## Article

## Indicators

- Health indicators
- Cited by SciELO
- Access statistics

## Related links

- Cited by Google
- Similars in SciELO
- Similars in Google

## Share

## Pesquisa Operacional

##
*Print version* ISSN 0101-7438

### Pesqui. Oper. vol.30 no.2 Rio de Janeiro May/Aug. 2010

#### https://doi.org/10.1590/S0101-74382010000200010

**ARTIGOS REGULARES**

**Climate changes and their effects in the public health: use of poisson regression models**

**Jonas Bodini Alonso ^{I}; Jorge Alberto Achcar^{II}; Luiz Koodi Hotta^{III, *}**

^{I}Departamento de Estatística / IMECC. Universidade Estadual de Campinas (UNICAMP). Campinas - SP, Brasil

^{II}Departamento de Medicina Social / FMRP. Universidade de São Paulo (USP). Ribeirão Preto - SP, Brasil

^{III}Departamento de Estatística / IMECC. Universidade Estadual de Campinas (UNICAMP). Campinas - SP, Brasil. hotta@ime.unicamp.br

**ABSTRACT**

In this paper, we analyze the daily number of hospitalizations in São Paulo City, Brazil, in the period of January 01, 2002 to December 31, 2005. This data set relates to pneumonia, coronary ischemic diseases, diabetes and chronic diseases in different age categories. In order to verify the effect of climate changes the following covariates are considered: atmosphere pressure, air humidity, temperature, year season and also a covariate related to the week day when the hospitalization occurred. The possible effects of the assumed covariates in the number of hospitalization are studied using a Poisson regression model in the presence or not of a random effect which captures the possible correlation among the hospitalization accounting for the different age categories in the same day and the extra-Poisson variability for the longitudinal data. The inferences of interest are obtained using the Bayesian paradigm and MCMC (Markov chain Monte Carlo) methods.

**Keywords:** daily hospitalizations; climate changes; Bayesian analysis; MCMC methods.

**RESUMO**

Neste artigo, analisamos os dados relativos aos números diários de hospitalizações na cidade de São Paulo, Brasil no período de 01/01/2002 a 31/12/2005 devido a pneumonia, doenças isquêmicas, diabetes e doenças crônicas e de acordo com a faixa etária. Com o objetivo de estudar o efeito de mudanças climáticas são consideradas algumas covariáveis climáticas os índices diários de pressão atmosférica, umidade do ar, temperatura e estação do ano, e uma covariável relacionada ao dia da semana da ocorrência de hospitalização. Para verificar os efeitos das covariáveis nas respostas dadas pelo numero de hospitalizações, consideramos um modelo de regressão de Poisson na presença ou não de um efeito aleatório que captura a possível correlação entre as contagens para as faixas etárias de um mesmo dia e a variabilidade extra-poisson para os dados longitudinais. As inferências de interesse são obtidas usando o paradigma bayesiano e métodos de simulação MCMC (Monte Carlo em Cadeias de Markov).

**Palavras-chave:** hospitalizações diárias; fatores climáticos; análise bayesiana; métodos MCMC.

**1. Introduction**

An important problem related to public health is the possible relationship between climate factor levels and the incidence of some specific diseases. Great climate variations have been observed in the last years due to many different factors, in special, the degradation of the environmental conditions linked to the fast population increasing. In this way, there is a great interest by doctors of the public health to study the existing relations between the daily hospitalization countings due to some specific diseases with some daily climate levels as atmospheric pressure, air humidity and temperature. The effects of climate variations can be larger in high-risk age groups as newborns or old age persons. One way of considering this situation is the use of counting regression model for each age group. A special model for counting data is given by a Poisson regression model capturing the possible existing correlation among the hospitalization daily counting in each age class. Longitudinal Poisson data is common in many applications considering medical studies (see for example, Henderson & Shikamura, 2003; or Dunson & Herring, 2005), where the counting are measures for each sampling unit in different times or repeated measures.

In this study, we consider a data set related to the daily hospitalization counting due to pneumonia, coronary ischemic diseases, diabetes and chronic diseases in different age categories in the São Paulo city in the period ranging from 01/01/2002 to 12/31/2005 and classified in different patient age groups and in the presence of some climate covariates.

The hospitalization group denoted as Pneumonia is related to the codes J10 to J18 of the CID 10; the hospitalization group denoted as Chronic Diseases (of upper airways) is related to the codes J40 to J47 of the CID 10; the hospitalization group denoted as Coronary Ischemic Diseases is related to the codes I20 to I25 of the CID 10 and the hospitalization group denoted as Diabetes (Mellitus) is related to the codes E10 to E14 of the CID 10. This codification is used by the health system SUS of the São Paulo city and were available by the health office of the São Paulo city.

The age categories are defined taking as basis the group classification used by the Department of informatics of the Brazilian health system (DATASUS). In this system a person is classified as group 0 for ages up to 1 year old; group 1 for ages from 1 year old to 5 years old; group 2 for ages from 5 years old to 10 years old; and so on up to group 16 that includes persons aging more than 75 years old. In our study, we consider some special aggregation of these groups depending on the type of hospitalization. Two age neighboring groups for one type of hospitalization are aggregated in the same age group if they have similar behavior in term of number of hospitalization. The group classifications for each type of hospitalization are presented in table 1.

The climate data were available from the Institute of Astronomy and Geophysics of the University of São Paulo (IAG-USP). The climate covariates considered in this study are the daily average of atmospheric pressure, the daily average of air humidity and the daily average of temperature. These average levels are found considering the average of the daily observed maximum and minimum in each day. We also considered other covariates, as year seasons and weekly days.

The main goal of this study is to verify if these climate covariates affect the daily hospitalization counting and also to verify which population age groups are more susceptible to climate changes. These results can be of great interest for the public health system for prediction and administration of the hospital system.

To analyze this data set, we introduce two Poisson regression models in the presence or absence of a random factor which captures the correlation between the repeated measures for the same day and the presence of extra-Poisson variability for the data (see, for example, Albert, 1992; Achcar *et al.*, 2008).

Poisson regression models in the presence of frailties or random effects have been considered by many authors (see, for example, Crouchley & Davies, 1999; Korsgaard & Andersen, 1998; Legler & Ryan, 1997; Li, 2002; Moustaki & Knott, 2000; Petersen, 1998; or Sammel *et al.*, 1997).

The inferences of interest are obtained using Bayesian methods. The posterior summaries of interest are obtained via MCMC simulation methods as the popular Gibbs sampling algorithm (see, for example, Gelfand & Smith, 1990) or the Metropolis-Hastings algorithm (see, for example, Chib & Greenberg, 1995).

The paper is organized as follows. In section 2, we introduce the statistical model. Section 3 introduces a Bayesian analysis for the models. Section 4 introduces the analysis of the São Paulo City data set. Finally, Section 5 presents a discussion about the obtained results.

**2. The Statistical Model**

Let *N _{ij}* a random variable with Poisson distribution, i.e.

where *n _{ij}* is the number of hospitalizations in the i-th day of patients in age group j, i = 1 ,..., N (number of days) and j = 1, ..., K (number of age groups). In the application we consider the number of admissions by type of disease. Associated with each combination day / age, we consider the presence of covariates

*x*

_{1i}(average atmospheric pressure between the minimum and maximum on the i-th day),

*x*

_{2i}(average humidity between the minimum and maximum on the i-th day),

*x*

_{3i}(average temperature between minimum and maximum on the i-th day) and dummy variables or indicator

*x*

_{4i}(equals 1 if the fall in the i-th day, 0 otherwise),

*x*5

_{i}(equal to 1 if the winter i-th day, 0 otherwise),

*x*

_{6i}(equal to 1 if spring in the i-th day, 0 otherwise), and

*x*

_{7i}(equal to 1 if the i-th day is Saturday or Sunday, 0 otherwise).

As we have longitudinal data representing the number of hospitalization on the same day for different age groups, we introduce a random effect or frailty *w _{i}* that captures the correlation between repeated measurements for the i-th day and extra-Poisson variability.

Assuming a Poisson distribution (1) for *N _{ij}* with parameter λ

*ij*, consider the regression model,

where

, 1 = 1,...,3; θ_{j} = (α_{j},β_{1j},β_{2j},β_{3j},β_{4j},β_{5j},β_{6j},β_{7j})* ^{T}*, j = 1,...K is the vector of unknown parameters and

*w*is a random effect with normal distribution,

_{i}where i = 1, ..., N.

Note that, according to equation (1), since *N _{ij}* has a Poisson distribution, then,

*E*(

*N*) =

_{ij}*Var*(

*N*) = λ

_{ij}*, where λ*

_{ij}*is given in (3). Also note that non-conditional means and variances are given by*

_{ij}where *0 _{j}* is defined in (2) and

*x*= (

_{i}*x*

_{1i},...,

*x*

_{7i})

*is the covariate vector associated with the i-th day.*

^{T}As the random effects *w _{i}* have a normal distribution (4), then exp(

*w*) have a log-normal distribution with mean equals to exp(σ

_{i}^{2}/2) and variance equals to exp(σ

^{2}/2)(exp(σ

^{2}/2)-1), i.e.,

and

From (6) and (7), we observe that the mean and variance of *N _{ij}* are different, that is, we have the presence of an extra-Poisson variability given by exp(σ

^{2}/2)(exp(σ

^{2}/2)-1), where η

*is given by (3).*

_{ij}For the analysis of data from hospitalizations, let us consider two possible models: a model denoted as "Model 1" without the presence of random effect *w _{i}* and a "Model 2" with the presence of random effects,

*w*, i = 1,..., N.

_{i}Note that the presence of a random effect in the Poisson regression model can hinder the achievement of classical inferences for the parameters of the model. A possible simplification to obtain the inferences of interest is to consider Bayesian methods. In addition, the Bayesian methods allow the incorporation of information from experts in the prior distribution for the model parameters. This methodology has been considered by many authors in the analysis of longitudinal count data (see e.g., Albert & Chib, 1993; Chib *et al.*, 1998; Clayton, 1991; Dunson, 2000, 2003).

**3. Bayesian Analysis**

Assuming the model given by (2) and (3), the likelihood function for θ = (θ_{1},...,θ* _{k}*)

*, given the observed data*

^{T}*N*, the unobserved variables

_{ij}*w*and the vector of covariates

_{i}*x*, is given by

_{ij}For the first stage of the hierarchical Bayesian analysis, assuming "Model 2" in the presence of random effects *w _{i}*, we consider the following prior distributions for model parameters

where l = 1, ..., 7 ; j = 1,...,K ; *a _{j}*,

*b*and

_{j}*c*are known hyperparameters.

_{j}For the second stage of the hierarchical Bayesian analysis, where *w _{i}* has a normal distribution

*N*(0,σ

^{2}), we assume that

where *d* and *e* are known parameters, and Gamma (d, e) denotes a gamma distribution with mean *d/e* and variance *d/e*^{2}. We assume that the prior distributions of the parameters are independent.

For "Model 1" without the presence of random effect *w ^{i}*, we assume the same prior distributions for α

*and β*

_{j}*given in (9).*

_{ij}The joint posterior distribution for θ and σ^{2} is obtained by combining the likelihood function (8) with the prior joint distribution for the parameters and *w _{i}*, i.e.

where **N** is the vector of data *N _{ij}*, and

**x**is the vector of covariates xli; the likelihood function

*L(0)*is given in (8).

Summaries of the a posteriori distributions of interest are obtained using MCMC methods. In this way, we simulate samples from the conditional distributions for each parameter given the other parameters and the vectors of data and covariates.

A great simplification is obtained using the software WinBUGS (Spiegelhalter *et al.*, 2003) that only requires the specification of the distribution to the data and prior distributions for the parameters.

The selection of the best model can be done using several Bayesian discrimination methods available in the literature. In our case we consider the Deviance Information Criterion (DIC) (see Spiegelhalter *et al.*, 2002) which is a useful criterion for selecting models when samples of posterior distribution for the model parameters are obtained using MCMC methods.

The deviance is defined by

where θ is a vector of unknown parameters of the model *L(*θ*)* is the likelihood function and *c* is a constant that need not be known in the comparison of models.

The DIC criterion is defined by

where *D*(0) is the deviation of the average evaluated in the posteriori mean 0 and *n _{D}* is the effective number of model parameters given by

*n*= -

_{D}*D*(θ), where =

*E*[

*D*(θ)] is the posteriori deviation measuring the quality of the adjustment by the model. Smaller values of DIC indicate better models.

**4. Data Analysis of Daily Hospital Admissions in São Paulo City**

Initially, let us consider the count data for daily hospital admissions in São Paulo for chronic conditions in the period 01/01/2002 to 31/12/2005. Assuming k = 8 age groups and the "Model 1" without the presence of a random effect with prior distributions given in (9) for *a _{j}* and β

_{lj}; l = 1, ..., 7; j = 1, ..., 8 and, l = 1, ..., 7, j = 1, ..., 8 and values of the hyperparameters equal to α

*= 0, and , we have used the WinBUGS software to simulate 1000 samples of the joint posterior distribution for the parameters. After a burn-in-sample of size of 5000, we took samples spaced by 10 in the Gibbs sampling algorithm in order to eliminate the effect of the initial values of parameters used in the iterative procedure and to have approximately uncorrelated samples.*

_{j}Similarly, with the same steps used to generate samples assuming "Model 1", we generate 1,000 samples from the joint posterior distribution (10) considering "Model 2" in the presence of random effect *w _{i}* with normal distribution (4), with d = e = 1 in the prior for σ

^{2}(10) and assuming an informative priori for α

*with α*

_{j}*= 1 and = 100. This choice of values of the hyperparameters for the prior was based on the results of "Model 1" to ensure the convergence of the algorithm simulation. For the parameter β*

_{j}*we used the same values considered for the hyperparameters of "Model 1".*

_{lj}The convergence of the algorithm was verified from the graphs of the simulated samples.

Table 2 presents the summaries of the posterior distributions of interest in case of hospitalization due to chronic diseases for models 1 and 2. The 95% credibility interval is given by 2.5% and 97.5% percentiles of the posteriori distributions. For brevity we present only the results of parameters whose credibility interval of at least one of the models does not include the value zero. The same criteria is used for other type of admissions.

For the choice between the two proposed models, we obtain from the 1000 simulated Gibbs samples, DIC equal to 46,664.9 for "Model 1" and DIC equal to 45,624.5 DIC for "Model 2". Thus, "Model 2" presents the best adjustment for the data according to DIC criterion because the value of DIC is lower for "Model 2" than for "Model 1".

We also compare the two models, evaluating the sum of squares of differences, *s*^{2}(v), v = 1, 2 (models 1 and 2), between the observed counts and means of the Monte Carlo posteriori distribution of λ* _{ij}*. For "Model 1" we find

*s*

^{2}(1)=5,3281.3 and for "Model 2", we find

*s*

^{2}(1)=3,9277.8, which gives strong indication in favor of "Model 2".

Similarly, considering the same values for the hyperparameters of prior distributions for the parameters of the models made in case of hospitalization due to chronic diseases, and the same MCMC sampling simulation method used previously, we have in tables 3, 4 and 5 the summaries of the posterior distributions for cases of hospitalizations due to ischemic heart disease (with 8 age groups) due to pneumonia (with 6 age groups) and due to diabetes (with 10 age groups).

In all cases we observe that "Model 2" (presence of random effect) presents the best adjustment to the data using as the criterion the lowest value of the sum of squares of the differences *s*^{2}(v) (observed - fitted).

The DIC criterion also selects "Model 2", except for the case of hospitalization due to diabetes. However, we decided for "Model 2" even for this case because the criterion of sum of squares of the differences provides a strong indication in favor of "Model 2" and in the case of hospitalization due to diabetes there is a large number days with zero hospitalization, which can invalidate the use of DIC criterion for model discrimination.

**5. Discussion of Results**

Considering the case of daily hospitalizations for chronic conditions and assuming that "Model 2" best fits the data, we have from table 2:

(1) The average atmospheric pressure in the i-th day is significant only for the group 8 (>= 75 years old). Note that the average of the posterior distribution of β

_{18}estimated by Monte Carlo based on 1,000 samples generated by Gibbs sampling algorithm is_{18}= -0.165, that is, we observe that there is a decrease in the number of daily hospital admissions due to an increase in the average level of air pressure for this group. That is, low atmospheric pressure leads to an increase in hospital admissions due to chronic diseases.(2) The average humidity in the i-th day is significant for groups 2, 6 and 8 (

_{22}= 0.0036, i.e., higher humidity leads to an increase in daily admissions for chronic conditions for patients in group 2; for groups 6 and 8 we have_{26}= 0.0062 and_{28}= -0.00866, that is, higher daily average humidity leads to a decrease in the number of hospitalization due to chronic diseases).(3) The average temperature in the i-th day is significantly positive (i.e., the increase in average daily temperature leads to an increase in the number of hospitalizations due to chronic diseases for under 10 years old persons (

_{31}= 0.00404;_{32}= 0.0348 and_{35}= 0.0315).(4) The fall is significant for all groups except 5 and 6 ([15,60) years old person). All effects are positive, except for group 4. (

_{41}= 1,000;_{42}= 0.4968;_{43}= 0.2809;_{44}= -0.1687;_{47}= 0.0922 and_{48}= 0.1092).(5) The winter is significant for all groups except ages 60 and higher. The effect is positive for groups corresponding to persons less than 10 years old (

_{51}= -0.2242;_{52}= 0.1010;_{53}= 0,1981) and negative for groups 5 and 6 (_{55}= -0.1899 and_{56}= -0.2016).(6) The spring is significant for groups 1,2,3 and 4 (

_{61}= 0.3699;_{62}= 0.1482;_{63}=0.2904 ,_{64}= -0.2046). The effect is positive for groups 1, 2 and 3 (increase of admissions) and negative for group 4 (decrease of admissions).(7) The Weekend (Saturday-Sunday) is significant for all groups except group 3, all with negative effect (

_{71}= -0.1616;_{72}= -0.1855;_{74}= -0.2969;_{75}= -0.3060;_{76}= -0.2239;_{77}= -0.3113 and_{78}= -0.3206).

Similarly, considering the case of hospitalization due to ischemic and assuming "Model 2", we have the following results (see table 3) in a simplified form:

(1) The average pressure in the i-th day is significant only for groups 2 and 7.

_{12}= 0.0245 and_{17 = 0.0159}.(2) The average humidity in the i-th day is significant only for group 2 (negative) and 7 (negative).

(3) The average temperature in the i-th day is significant only for group 7 (positive, = 0.0333).

(4) The fall is significant negative for group 3. (

_{43}= -0.1487).(5) The winter is significantly negative for groups 1, 2, 3 and 6.

(6) The spring is significant for all groups except for groups 3 and 4. The effect is negative for less than 40 years old persons and positive for 50 years old and older persons.

(7) The weekend (Saturday-Sunday) is significantly negative for all groups.

In the case of pneumonia considering "Model 2", we have (see table 4):

(1) The average pressure in the i-th day is significantly negative for groups 1, 3, 4 and 5.

(2) The average humidity in the i-th day is significantly negative for groups 1, 5 and 6 (less than one year old and 45 years old or older persons).

(3) The average temperature in the i-th day is significant for group 2.

_{32}= 0.0183.(4) The autumn is significantly positive for all groups.

(5) The winter is significantly positive for groups 1, 2 and 3 (less than 10 years old groups).

(6) The spring is significantly positive for groups 1, 2, 3 and 4 (less than 45 years old groups).

(7) The weekend (Saturday-Sunday) is significantly negative for all groups.

In the case of diabetes also assuming "Model 2", we have (see table 5):

(1) The average pressure in the i-th day is significantly positive for groups 1, 2, 5 and 7.

(2) The average humidity in the i-th day is significantly negative, except for groups 7, 8 and 10.

(3) The average temperature in the i-th day is significantly negative for groups 1, 2, 5 and 6.

(4) The autumn is significantly negative for all groups.

(5) The winter is significantly negative for all groups.

(6) The spring is significantly negative for all groups.

(7) The weekend (Saturday-Sunday) is significantly negative for all groups.

In view of these results, we can reach the following overall conclusions:

1. In general, all the variables are influencing the average number of admissions.

2. The generalized linear models with random effect using as dependent variable the logarithm of the climate covariates showed that climate change is related to the daily numbers of admissions of all disease groups studied in this paper.

3. The significance of dummy variables corresponding to the seasons and the weekend indicates that seasonal patterns also have influence on the behavior of the number of hospitalizations.

4. The meteorological conditions have more influence in groups corresponding to children and elderly.

**Acknowledgments**

This work was partially supported by CNPq, CAPES and FAPESP. The authors thank the laboratory EPIFISMA (IMECC-UNICAMP).

**References**

(1) Albert, J.H. (1992). Bayesian analysis of a Poisson random effect model for home run hitters. *The American Statistician*, **16**, 246-253. [ Links ]

(2) Albert, J.H. & Chib, S. (1993). Bayesian analysis of binary and polychotonious response data. *Journal of American Statistical Association*, **88**, 669-679. [ Links ]

(3) Achcar, J.A.; Coelho-Barros, E. & Martinez, E.Z. (2008). Statistical analysis for longitudinal counting data in the presence of a covariate considering different "frailty" models. *Brazilian Journal of Probability and Statistics*, **22**, 183-205. [ Links ]

(4) Chib, S. & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. *The American Statistician*, **49**, 327-335. [ Links ]

(5) Chib, S.; Greenberg, E. & Winkelmann, R. (1998). Posterior simulation and Bayes factors in panel count data models. *Journal of Econometrics*, **86**, 33-54. [ Links ]

(6) Clayton, David G. (1991). A Monte Carlo method for Bayesian inference in frailty models. *Biometrics*, **47**, 467-485. [ Links ]

(7) Crouchley, R. & Davies, R.B. (1999). A comparison of population average and random effects models for the analysis of longitudinal count data with baseline information. *Journal of the Royal Statistical Society*, A, **162**, 331-347. [ Links ]

(8) DATASUS - Department de Informatics do SUS, Secretaria Executiva, Ministério da SaúdeVersão 1.6c - ©1993 by CBCD and DATASUS. **In**: <www.datasus.gov.br/> Acessed on October 13, 2008. [ Links ]

(9) Dunson, D.B. (2000) Bayesian latent variable models for clustered mixed outcomes. *Journal of the Royal Statistical Society*, B, **62**, 335-366. [ Links ]

(10) Dunson, D.B. (2003) Dynamic latent trail models for multidimensional longitudinal data. *Journal of the American Statistical Association*, **85**, 385-409. [ Links ]

(11) Dunson, D.B. & Herrinng, A.H. (2005). Bayesian latent variable models for mixed discrete outcomes. *Biostatistics*, **6**, 11-25. [ Links ]

(12) Gelfand, A.E. & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. *Journal of the American Statistical Association*, **85**, 385-409. [ Links ]

(13) Henderson, R. & Shimakura, S. (2003). A serially correlated gamma frailty model for longitudinal count data. *Biometrika*, **90**, 355-366. [ Links ]

(14) Korsgaard, I.R. & Andersen, A.H. (1998). The additive genetic gamma frailty model. *Scandinavian Journal of Statistics*, **25**, 255-269. [ Links ]

(15) Legler, J.M. & Ryan, L.M. (1997). Latent variable models for teratogenesis using multiply binary outcomes. *Journal of American Statistical Association*, **92**, 13-20. [ Links ]

(16) Li, H.Z. (2002). An additive genetic gamma frailty model for linkage analysis of diseases with variable age of owsed using nuclear families. *Lifetime Data Analysis*, **8**, 315-334. [ Links ]

(17) Moustaki, I. & Knott, M. (2000). Generalized latent trail models. *Psychometrika*, **65**, 391-411. [ Links ]

(18) Petersen, J.H. (1998). An Additive frailty model for correlated life times. *Biometrics*, **54**, 646-661. [ Links ]

(19) Sammel, M.D.; Ryan, L.M. & Legler, J.M. (1997). Latent variable models and mixed discrete and continuous outcomes. *Journal of the Royal Statistical Society*, B, **59,** 667-678. [ Links ]

(20) Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P. & Vander Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). *Journal of the Royal Statistical Society*, B, **64**, 583-639. [ Links ]

(21) Spiegelhalter, D.J.; Thomas, A.; Best, N. & Gilks, W.R. (2003). Winbugs: Bayesian inference using Gibbs version 1.4. Cambridge, UK: *MRC Biostatistics Unit*, Institute of Public Healthy. [ Links ]

Recebido em 12/2008; aceito em 09/2009

Received December 2008; accepted September 2009

* *Corresponding author* / autor para quem as correspondências devem ser encaminhadas