Use of Poisson spatiotemporal regression models for the Brazilian Amazon forest : malaria count data

Introduction: Malaria is a serious problem in the Brazilian Amazon region, and the detection of possible risk factors could be of great interest for public health authorities. The objective of this article was to investigate the association between environmental variables and the yearly registers of malaria in the Amazon region using Bayesian spatiotemporal methods. Methods: We used Poisson spatiotemporal regression models to analyze the Brazilian Amazon forest malaria count for the period from 1999 to 2008. In this study, we included some covariates that could be important in the yearly prediction of malaria, such as deforestation rate. We obtained the inferences using a Bayesian approach and Markov Chain Monte Carlo (MCMC) methods to simulate samples for the joint posterior distribution of interest. The discrimination of different models was also discussed. Results: The model proposed here suggests that deforestation rate, the number of inhabitants per km2, and the human development index (HDI) are important in the prediction of malaria cases. Conclusions: It is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk. We conclude that the use of Poisson regression models that capture the spatial and temporal effects under the Bayesian paradigm is a good strategy for modeling malaria counts.

Malaria is an endemic disease common in tropical and subtropical regions, including parts of the Americas, Asia, and Africa, and it is one of the most common infectious diseases.Malaria causes about 400-900 million cases of fever and approximately one to three million deaths annually [1][2] .It is a disease commonly associated with poverty.There are over 400 species of the malarial parasite (Plasmodium spp.), many of which infect different cold-and warm-blooded animals; however, only four routinely infect humans (P.falciparum, P. vivax, P. malariae, and P. ovale) 3 , and the two most common are P. falciparum and P. vivax 4 .P. falciparum causes the most severe disease and almost all fatalities 4 .Each one of these four malarial parasites is transmitted from one person to another by the bite of an infected female Anopheles spp.mosquito.Ecological alterations can affect the spread of these insects and, consequently, the spread of malaria.
Many papers have been introduced in the literature on the epidemiology of malaria in different forest ecological zones around the world.Dash et al. 5 introduced a paper on malaria epidemics in India, a country with high incidence of the disease.They observed that heterogeneity and variability exist in the risk of malaria transmission between and within the provinces of India because many ecotypes/paradigms of malaria are recognized, with people living in forest pockets in different provinces contributing to morbidity and mortality due to malaria.Another study 6 considered the use of spatiotemporal analytical tools to determine the social and environmental drivers of malaria risk in Vietnam.The authors analysed malaria data on reported P. falciparum and P. vivax infections by month and by district.They observed a strong temporal and spatial heterogeneity in counts of malaria cases and also considered some risk factors, such as the district's population living below the poverty line, the percent of districts covered by

METHODS
Achcar JA et al -Spatiotemporal models for malaria data forest, median elevation, median long-term average precipitation, and minimum temperature.Owusu-Agyei et al. 7 studied the epidemiology of malaria in the forest-savanna zone of Ghana.They found that the clinical malaria attack rate and the prevalence of malaria in children were very high in spite of a decrease in several parts of Africa as a result of control interventions.In another paper 8 , the authors concluded that the dynamics and seasonal abundance of malaria vectors in the Kintampo area of Ghana were influenced by microecology, rainfall, and temperature patterns.Reid et al. 9 considered the use of Bayesian geostatistical models in mapping the malaria risk in Bangladesh.In all these studies, it is important to point out that the presence of different socioeconomic factors could have had great impact on the spread of malaria during that time.Another important point is related to the effect of neighboring regions on epidemics of malaria, which could justify the use of a space-time framework in these studies.
In Brazil, especially in the Amazon forest region, malaria is still a great health concern for health authorities.The prediction and characterization of malaria occurrence in an important world region, such as the Amazon forest, are important in malaria surveillance and control programs, and in understanding why and where resources should be redirected to present or reduce the projected increased disease occurrence.Different socioeconomic or spatiotemporal factors could be associated with the increase in malaria cases, such as the number of inhabitants per square kilometer, the proportion of urban population, the number of doctors per 10,000 inhabitants, and the human development index (HDI).Other factors, such as spatial or temporal aspects and deforestation, might be of interest, the latter being considered a strong determinant of the increasing incidence of malaria 10 .
In Figure 1, panel A, we have the yearly numbers of malaria fever cases per 1,000 inhabitants from 1999 to 2008 in nine provinces that belong to the Brazilian Amazon forest region (data source: DATASUS, Ministério da Saúde, Brazil).From Figure 1, we observe that, despite the Brazilian health authorities' efforts to control malaria fever, there are increasing numbers of malaria in two provinces, Acre and Rondônia, which have been undergoing great economic development in the last years (especially in the period from 1999 to 2006), resulting in population growth and increasing deforestation to create new cattle and agriculture farms.For the other seven provinces of the Amazon region, we also observe large numbers of malaria cases.
In this paper, we investigate the association between environmental variables and the yearly registers of malaria in the Brazilian Amazon region using a Poisson [11][12] spatiotemporal model under the Bayesian paradigm [13][14] , with a latent term that captures the spatial effects of neighboring provinces and another latent term that captures the possible temporal correlation in each province through the years.

Study design
This was an ecological study, with the Brazilian provinces of the Amazon region as units of analysis.Data were obtained from Sistema de Informações de Malária (SISMAL), Instituto Brasileiro de Geografia e Estatística (IBGE), Departamento de Informática do Sistema Único de Saúde (DATASUS), and Programa de Cálculo do Desflorestamento da Amazônia (PRODES).The study period covered the years from 1999 to 2008.A preliminary analysis of this data set gave some information on the behavior of the malaria rate and allowed us to identify isolated associated risk factors, but the proposed statistical model provided a general relationship among the malaria incidence rate and some risk factors, the spatial dependence, and the longitudinal trend over time.

The statistical model
Let Y i,j be the yearly number of malaria cases for province i in the year j, i=1, …, n and j=1, …, T, with a Poisson distribution, given by where P(μ) denotes a Poisson distribution with parameter μ.In connection with the malaria data shown in Figure 1, panel A, we consider the following covariates: X 1i , denoting the number of  1.
Considering the covariates X 1 , X 2 , X 3 , X 4 , and X 5 , let us assume the following regression model for the Poisson distribution (1): for i=1, …, 9 (provinces) and j=1, …, 10 (years); X l , l=1,…, 4 denotes the sample average for the covariate X l , that is, n X l = .In (2), we observe that β 0j , β 1j , β 2j , β 3j , β 4j , and β 5j are fixed effect regression parameters associated with the covariates X 1 , X 2 , X 3 , X 4 , and X 5 ; i=1,…, 9 and j=1, ….,10.b i is a random effect that captures the possible correlations among the malaria counts, taking into account the region effects of neighboring provinces assumed to have a normal CAR structure [15][16][17][18][19][20][21] model, that is, where A * (i) denotes the set of neighbors corresponding to province i, n(i) denotes the number elements in A * (i), η i denotes the mean of the neighboring random effects for province i, and ζ b is an unknown parameter.
In (2) we also assume a random effect W ij for the longitudinal trend specified as a Gaussian process with a multivariate normal distribution with a mean vector 10 x 1, with all components equal to zero and a 10 x 10 variancecovariance matrix Σ=[Cov (W ij , W ij* )], with elements given by: assuming a fixed value for K. Different fixed values for K can be considered, which gives a great flexibility of fit for the data, and θ k , k=1, 2, …, K, is an unknown parameter.Observe that model (4) generalizes the longitudinal trend specification introduced by Branscum et al. 22 For the first stage of the hierarchical Bayesian analysis, let us assume normal prior distributions β lj ~ N(0, a 2 lj ), l=0,1,2,…,5, j=1,…,10, where a lj are known hyperparameters.Taking large values for a lj , we have highly dispersed prior distributions for the regression parameters.For the second stage of the hierarchical Bayesian analysis, let us assume uniform prior distributions ς b ~ U(0, c) and θ k ~ U(0, d k ) for k=1, 2, …, K; c and d k are known hyperparameters.
For a hierarchical Bayesian analysis of the model, we consider the use of Markov Chain Monte Carlo (MCMC) methods [13][14] .For the model choice, assuming different values for K in (4), we use some existing Bayesian model discrimination criteria, one of which is given by the Deviance Information Criterion (DIC) 23 .The smaller the DIC, the better the fit of the model for the data.We also consider some goodness-of-fit techniques in choosing the best model.In this way, we compare the observed data with the fitted or estimated posterior means for μ ij using the simulated Gibbs samples for each parameter of the model, given by: To verify if the covariate deforestation rate and the covariate number of inhabitants per km 2 , percentage of urban population, number of doctors per 10,000 inhabitants, and HDI index have some significative effects in the yearly counts of malaria in the Brazilian provinces of the Amazon forest region, we assume the Poisson regression considering two special cases.First, a model defined by ( 1) and ( 2), not considering the temporal random effect W ij and assuming that the random effect b i has a normal distribution N(0, σ b ), where σ b is an unknown variance; let us denote this as model 1.Second, a model defined by ( 1) and ( 2), with a normal CAR structure (3) for the random effects b i (spatial structure) and a multivariate normal distribution with covariance structure (4) for the random effect W ij (temporal structure) with K=2 (the neighboring structure is given in Table 1); let us denote this as model 2.
For a hierarchical Bayesian analysis of model 1 and model 2, let us assume a uniform U(0,1) prior distribution for ς h , where ς h =1/ σ b , the normal prior distributions for the regression parameters, β lj ~ N(0, a lj ), l=0, 1, …, 5; j=1, 2, …, 10; and the uniform prior distributions for ς b and θ k , k=1, 2 with a lj =10; b=0, c=1, d 1 =3, and d 2 =1.Using the WinBUGS software 24 , we simulate the two models: 10,000 initial Gibbs samples as a burn-in sample to eliminate the effect of the initial values on the Gibbs sampling algorithm.After this burnin sample period, we simulate another 1,600 samples, taking every 50 th sample to have approximately uncorrelated samples for the joint posterior distribution of all parameters of the model.Convergence of the Gibbs sampling algorithm was observed using plots of the simulated time-series samples.

RESULTS
Considering the yearly numbers of malaria cases per 1,000 inhabitants in the Amazon forest region from 1999 to 2008, we observe (Figure 1) that for most of the provinces, there was a decrease from 1999 to approximately the year 2002, an increase starting in the year 2002 to approximately the year 2005, and a decrease after the year 2005.
We also observe that the deforestation rate in the Brazilian Amazon forest region presents a behavior similar to the yearly counts of malaria cases, that is, an increase from 1999 to 2004 and a decrease after 2004 in almost all the provinces (Figure 1, panel B).
Assuming model 1, we have in Table 2 the posterior summaries of the parameters that do not include zero in the 95% credible intervals, that is, the parameters that have significative effects.From the obtained results in Table 2, we can have the following interpretations: I) the covariate X 1 (number of inhabitants per km 2 ) presents a significative effect on the malaria count in the year 2006 (zero is not included in the 95% credible interval for β 18 ); II) the covariate X 4 (HDI index) presents a significative effect on the malaria count in the years 1999 and 2000 (zero is not included in the 95% credible  intervals for β 41 and β 42 ); and III) the covariate X 5 (deforestation rate) presents a significative effect on the malaria count in the years 1999,  2000, 2001, 2002, and 2003 (zero is not included in the 95% credible intervals for β 51 , β 52 , β 53 , β 54 , and β 55 ).These regression parameters correspond to the years with larger deforestation rates (Figure 1).
Assuming model 2, we have in Table 2 the posterior summaries of the parameters that present some significative effects (zero is not included in the 95% credible intervals).From the results in Table 2 (model 2), we observe that the covariate X 5 (deforestation rate) presents a significative effect on the malaria count in the years 2001 and 2002 (zero is not included in the 95% credible intervals for β 53 and β 54 ).
Assuming model 1, we obtain from the WinBUGS output a Monte Carlo estimate for DIC given by DIC = 758.994.In this case, we found C(1)=308.105(see equation ( 5)).
Considering model 2, we obtained DIC = 554.580and C(2)=89.125.That is, model 2 is better fitted to the malaria count in the Brazilian Amazon forest region since the estimated DIC and the sum of absolute values for the differences between y ij and μ ij give smaller values in this model.

DISCUSSION
Deforestation is associated with loss of biodiversity, the alteration of many fundamental aspects of ecosystems, and an increase in the prevalence and incidence of many human vector-borne diseases 25 .The year-round high rainfall and temperatures in tropical forests are ecological characteristics that favor the development of many kinds of mosquitoes that transmit pathogens that cause disease in humans 26 .Thus, the process of clearing forests and the subsequent human activities, such as agricultural and hydropower developments, have a high influence on the prevalence and incidence of diseases such as human malaria 27 .In a study that examined the ecological alteration in the Peruvian Amazon 28 , it was shown that various landscapes and ecological features associated with deforestation were positively associated with Anopheles darlingi larval breeding sites.This is considered the major South American malaria vector.Another study 29 showed that in western Kenya, deforested sites had higher temperatures and relative humidities, and the overall infection rate of Anopheles mosquitoes was increased compared with that in forested sites.In addition, the vectorial capacity was estimated to be 77.7% higher in the deforested site than in the forested site.In 1981, the construction of the Tucuruí hydroelectric dam in southeast Pará province, Brazil, caused enormous environmental changes, deforestation, and human migration to the region, and a study 30 confirmed the elevation of malaria to the epidemic level since then.In a review of the literature 27 , sixty examples of changes in anopheline ecology throughout the tropical world were identified as a consequence of deforestation and agricultural development.
The present study also found that deforestation rate is important in the prediction of malaria cases.In addition, the Bayesian model here proposed suggests that the number of inhabitants per km 2 and the HDI index have some association with the number of malaria cases.The growing population in the Brazilian Amazon region is associated with human migration motivated by new agricultural settlements and informal mines (garimpos) 31 that promote environmental changes and contribute to the spread of malaria.The association of malaria with the HDI index and poverty was also discussed in a recent article 32 , which argued that the North and Northeast regions of Brazil have the lowest HDI indexes and the highest rates of neglected tropical diseases.In this context, it is important to point out that areas with low human development tend to have poor access to health care and malaria interventions, increasing the vulnerability of the poorest.
In view of the results of the present study and other research reported in the literature, it is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk.The possible environmental interventions to reduce malaria risk in the Brazilian Amazon forest region include policies that promote sustainable agriculture aimed at protecting the rainforest and incentives for forest regeneration.
To map the malaria count data in the Amazon region, the provinces are used as geographical limits.Thus, we can note that the areas of the provinces are too large, and this is one of the limitations of the study.However, more refined data are not available for small regions, making it too costly to develop a more detailed analysis of the data.In addition, the migration of individuals into or out of the provinces is not controlled in the data analysis, which can cause ecological bias 33 .Another possible source of bias is the use of data from different systems of information and its subsequent limitations.As examples of this, the official records of malaria are likely to be biased and underreported given the lack of health facilities in remote or poor areas 34 , and the available deforestation data are subject to a possible lack of accuracy.Finally, it is also possible to conclude that the use of Poisson regression models in the presence of random effects that capture the spatial and temporal effects under the Bayesian paradigm is a good strategy for modeling disease counts in the presence of covariates.The use of spatiotemporal models, as introduced in this paper, to analyse the malaria data of the Brazilian Amazon forest region is becoming popular in the analysis of longitudinal disease count data under

FIGURE 1 -
FIGURE 1 -A) Yearly numbers of malaria per 1,000 inhabitants in the Brazilian provinces in the Amazon forest region.B) Deforestation rate in the Brazilian provinces in the Amazon forest region (km 2 /year).

2 (
It is also interesting to observe that model 2 in the presence of a spatiotemporal structure gives a better fit for the data when comparing the individual Monte Carlo estimates for the means of the Poisson model with the observed malaria data.Assuming model 2, which gives the best fit for the malaria data shown in Figure1, we have in Figure2the different estimates for the malaria rates in the Brazilian Amazon forest provinces Achcar JA et al -Spatiotemporal models for malaria data ^^2 2 from 1999 to 2008.From Figure2, we observe that the mean rates for the provinces in the Brazilian Amazon forest region increased greatly in the years from 2004 to 2007, especially for the provinces of Amazonas, Acre, and Rondônia.

FIGURE 2 -
FIGURE 2 -Estimates of the mean malaria rates in the Brazilian provinces in the Amazon forest region from 1999 to 2008.

TABLE 1 -Covariates for the malaria count data in the Amazon forest region and neighboring structure for the Brazilian provinces.
(in the year 2004) in province i; X 2i , denoting the percentage of urban population (in the year 2004) in province i; X 3i , denoting the number of doctors per 10,000 inhabitants (in the year 2005) in province i; and X 4i , denoting the human development index (HDI) (in the year 2005) in province i of the Amazon forest region, i=1, …, 9. We also consider the deforestation index for province i in the year j as another covariate, denoted by X 5ij , i=1,…,9, j=1,…,10 (corresponding to the years 1999 to 2008).The covariate values are introduced in Table HDI: human development index.Source: www.portalbrasil.net/brasil.htm/maps.The numbers within parentheses after the names of the provinces correspond to their individual labels in the statistical model.