Using satellite data to study the relationship between rainfall and diarrheal diseases in a Southwestern amazon basin

The North region is the second region in Brazil with the highest incidence rate of diarrheal diseases in children under 5 years old. The aim of this study was to investigate the relationship between rainfall and water level during the rainy season principally with the incidence rate of this disease in a southwestern Amazon basin. Rainfall estimates and the water level were correlated and both of them were correlated with the diarrheal incidence rate. For the Alto Acre region, 2 to 3 days' time-lag is the best interval to observe the impact of the rainfall in the water level (R = 0.35). In the Lower Acre region this time-lag increased (4 days) with a reduction in the correlation value was found. The correlation between rainfall and diarrheal disease was better in the Lower Acre region (Acrelândia, R = 0.7) and rainfall upstream of the city. Between water level and diarrheal disease, the best results were found for the Brasiléia gauging station (Brasiléia, R = 0.3; Epitaciolândia, R = 0.5). This study's results may support planning and financial resources allocation to prioritize actions for local Civil Defense and health care services before, during and after the rainy season.


Using satellite data to study the relationship between rainfall and diarrheal diseases in a Southwestern amazon basin
O uso de dados de satélite para estudar a relação entre chuva e doenças diarreicas em uma bacia na Amazônia Sul-Ocidental resumo A região Norte é a segunda no Brasil com a maior taxa de incidência de doenças diarreicas em crianças menores de 5 anos.O objetivo deste estudo foi investigar a relação entre chuva e nível do rio, principalmente durante a estação chuvosa, com a taxa de incidência da referida doença em uma bacia no sudoeste da Amazônia.Estimativas de chuva e nível do rio foram correlacionadas e ambos correlacionados com a taxa de incidência da diarreia.Para a região do Alto Acre, 2 a 3 dias de defasagem é o melhor intervalo para observar o impacto da chuva no nível do rio (R = 0.35).Na região do Baixo Acre essa defasagem aumentou (4 dias) com redução na correlação.A correlação entre chuva e doenças diarreicas foi melhor na região do Baixo Acre (Acrelândia, R = 0.7) e a chuva rio acima da cidade.Entre o nível do rio e as doenças diarreicas, os melhores resultados foram encontrados para a estação de Brasiléia (casos em Brasiléia, R = 0.3 e Epitaciolândia, R = 0.5).Os resultados deste estudo podem dar apoio ao planejamento e alocação de recursos financeiros para priorizar ações para Defesa Civil e serviços de saúde antes, durante e depois da estação chuvosa.Palavras-chave Diarreia infantil, Estação chuvosa, Sensoriamento remoto, Vigilância em saúde ambiental, Amazônia Brasileira abstract The North region is the second region in Brazil with the highest incidence rate of diarrheal diseases in children under 5 years old.The aim of this study was to investigate the relationship between rainfall and water level during the rainy season principally with the incidence rate of this disease in a southwestern Amazon basin.Rainfall estimates and the water level were correlated and both of them were correlated with the diarrheal incidence rate.For the Alto Acre region, 2 to 3 days' time-lag is the best interval to observe the impact of the rainfall in the water level (R = 0.35).In the Lower Acre region this time-lag increased (4 days) with a reduction in the correlation value was found.The correlation between rainfall and diarrheal disease was better in the Lower Acre region (Acrelândia, R = 0.7) and rainfall upstream of the city.Between water level and diarrheal disease, the best results were found for the Brasiléia gauging station (Brasiléia, R =  0.3; Epitaciolândia, R = 0.5) introduction Under normal climate conditions, diarrheal diseases and pneumonia are the major cause of mortality among children under five years, especially in poorer countries, which represent 29% of total deceased in an annual base worldwide 1 .However, these diseases have been widely reported after geophysical disasters, and hydrometeorological events such as floods 2 .According to statistics obtained from the Mortality Information System in Brazil 3 under the Informatics' Department of the Health System (DATASUS) from 1996 to 2012, diarrhea and gastroenteritis arise as one of the main causes of children mortality in Brazil.In Northeast region, this disease is the 4 th cause of death (6.2 cases per 1,000 children).Nevertheless it must be considered the higher number of children in this specific region.In the North and Central-West regions diarrhea is the 8 th main cause of infant mortality with an incidence rate of 2.4 and 1.7 per 1,000 children respectively.
Several studies have been trying to establish a link between the occurrence of water borne infectious diseases (as diarrhea) and climatic and hydrological patterns 4,5 .Some studies analyzed data related to climate extremes and flood episodes to discuss the impacts on health 6,7 .Some of the water diseases related to climatic extreme events are cholera, leptospirosis, hepatitis, bacillary dysentery and typhoid fever.These extremes episodes lead to a decrease in hygiene habits due to shortage of fresh water and the damage caused by the lack of sewage system.In addition, the high number of people accommodated in shelters during the flood period increases altogether the incidence of these diseases.
Since 2012, southwestern Amazon has been severely affected by extreme floods 8,9 .In 2014, Rio Branco's mayor, the capital of Acre State, declared the emergency state alongside with Peruvian, Bolivian and other Brazilian cities.More than 2.000 houses were affected in the capital of Acre.Recently, during the 1st semester of 2015 another flood event affected this region and again exposed the Amazonian population to the impact of waterborne diseases 10 .
The number of meteorological stations and rain gauges is very limited in Amazonia considering the nature, scale, dynamic and microphysics involved in the rainfall events in the tropics, usually caused by local convection 11 .Satellite data provides a useful alternative to allow filling the absence of locally measured data [12][13][14][15] .Therefore in Acre basin there is no previous work valida-ting any satellite rainfall estimates.Thus, this work aims to a) evaluate the satellite rainfall estimates for the Acre Basin; b) analyze the correlation between these estimates and: 1) water level in three cities along river Acre, 2) Investigate the incidence rates of infant diarrheal diseases across all municipalities in the river Acre Basin.

Study area
The Acre State, which is located at the southwestern Amazon, is characterized by a seasonal rainfall regime monsoon 7,16  Difficulties regarding maintenance of the equipment (station or rain gauges) and logistic problems are some of the reason why the data from some of the stations are not updated.

rainfall observed data source
This study used daily records of the 12 rain gauges from AcreBioClima Project 21 for the 2006-2013 period (Table 1).The selection was based on the availability of data during the period of November to April.This criteria was applied be- cause 83% of the total annual rainfall is recorded during this semester 17 .
These rain gauges were split into Upper and Lower Acre (six for each) with a balanced distribution of the available stations in order to show the series behavior.

rainfall estimated data source
The Tropical Rainfall Measuring Mission 22 covers the latitudes from 50 o N to 50 o S and provides information on a 3-hour time through the 3B42RT product.In this work it was used a derived version of the 3B42 data provided in a daily basis available from 1998-2015-Feb 15,23 .TRMM's spatial resolution is 0.25 o x 0.25 o degrees and it corresponds to grid cells of 625 km 2 approximately.

infant Diarrheal Diseases incidence cases
The statistic of infant diarrheal diseases from the Acre State Health Secretary was obtained for the 2000-2012 period on a monthly basis.See PULSE-Brazil website 24 in order to access this data series.
Our focus was children under 5 years old and the data were extracted from the Acute Diarrhea Daily Monitoring System also known by the acronym SIVEP-DDA.The dataset was used to calculate the incidence rate for each municipality.According to World Health Organization (WHO) diarrheal diseases is classified in the International Classification of Diseases (ICD) under code 10, in Chapter I, and it covers the A00-A09 subtypes, which range from cholera to gastroenteritis and colitis of infectious and unspecified origin.In other words, it includes the three main causes of the disease virus, protozoa and bacteria 25 .

Satellite rainfall Validation
Rainfall observed data series were used to evaluate and validate the satellite data for this geographic location to which TRMM data had never been tested before.
Rainfall estimates from TRMM were extracted using the AcreBioClima geographic coordinates for the stations available in the Acre basin and both datasets were compared against each other through a contingency table 26,27 , as can be found in Wilks (2011, p. 261)  26 .This verification procedure is essential to validate the quality and coherence of satellite-retrieved climate variables in the context of a specific spatial and seasonal/ temporal scale of analysis.Finally, validated estimates are more comparable to other studies and products.
From the contingency table it was possible to calculate the following indexes: Proportion Correct (PC), Critical Success Index (CSI), Bias, False Alarm Ratio (FAR), Hit rate (H) and Probability of False Detection (POFD).The formulas for each of them are described in Equation 1-6.Rainfall estimates and water level were correlated assuming that the daily estimates would directly influence the water level in the three gauging station chosen and a shorter or longer time lag would be better to represent that influence.For the correlation analysis between rainfall and diseases, it was considered that the monthly accumulated rainfall in each pixel within a municipality have an influence in the incidence rate of diarrheal diseases in this municipality.For the correlation between river water level and diarrhea diseases, it was chosen three parameters combination: monthly water level mean versus diarrhea incidence, monthly maximum water level versus diarrhea incidence and monthly minimum water level versus diarrhea incidence.It was maintained all cities independent of its location in the basin (whether if it was up or downstream from where the water level was measured).

results testing trMM dataset
The results obtained through the analysis of contingency table have PC = 73% and CSI = 67%, and the higher the result the more accurate are the estimates.The Critical Success Index value means that the total hits for the rainfall estimates are higher than all the times it missed or generated a false alarm (estimated a rainfall event that was not captured by the station).For bias it was obtained 1.07 which indicates that there is a balance and the satellite estimates are not sub/ overestimating the number of events.It also accounts for estimation accuracy.To evaluate reliability and resolution it was calculated FAR, 22%, and H, 83%.Last index, POFD resulted in 49%.
Figure 1, graphic b, presents the rainfall data observed and estimated divided by region, Upper and Lower Acre regions.Median values for both data sources and for both regions vary from approx.200 to 260 mm monthly.Maximum values floats around 400 mm (estimated) and 500 mm (observed).The minimum values present differences lower than 100 mm comparing the two sources / two regions and they float around 100 mm for the upper range limit.
In Figure 1 c the monthly estimated values basically follow the observed ones which implies in a good representation of the seasonal behavior.Correlation values obtained between rain gauges' series (observed) and TRMM values were around 0.4.
Figure 1 d introduces the water level exploratory analysis showing the observed daily average during the year, which was built using the time series available.Characteristics as the location of the city relative to the main rivers and the tributaries were taken into consideration to analyze these results.
When a time-lag is applied it is observed a spatial coherence between the rainfall estimates and the water level at Brasiléia gauging station.The higher correlation values obtained was for 2 and 3 days lag, as shown in Figure 2, maps A and B (R ≈ 0.35).These values are the pixels in light grey on the left side of the white dot that indicates the station.The three pixels in the left bottom correspond to Assis Brasil municipality that is less than 100 km from Brasiléia.This confirms that the water transit time is higher the further the rainfall is upstream.
Xapuri gauging station presents the strongest signals during the 3 and 4 day lag as shown in Rio Branco gauging station shows the strongest signals for 4 and 5 day lag, as shown in Figure 2, maps E and F (R ≈ 0.31).The pixels located upstream the measurement point have stronger correlation coefficients as the time-lag increases.Short time-lags have low correlation values indicating that rainfall relates with water level in this gauging station more frequently after a minimum time-lag of 3 days.
From Figure 2 e, a pixel in the northwest part of the basin highlights an area of higher influence on the water levels (light grey).This particular pixel corresponds to Espalha River, a small tri-butary of the Riozinho do Rola River.This pixel maintains higher values until the final time-lag is applied even though its neighbor's pixels presents lower values as the lag increases.It is not clear why this pixel maintains a higher correlation value despite of the decrease presented by the neighbor pixels.However, the fact that it is next to northern limit of the basin an area that probably has higher altitude values would explain why this area has a larger contribution to the water level variation.

correlation: rainfall estimates versus diarrheal diseases cases
Box plots for the incidence rate of infant diarrheal diseases cases are shown in Figure 1 e for the eleven municipalities.As the ratio numbers are low (incidence rate) for most of the cities it is difficult to analyze the data.The median values for these cities are around 0.5 (less than one case per 1,000 children).Maximum extreme values are for Assis Brasil (> 4) and Acrelândia (> 3).
Figure 3 b and k show overlapping between the rainfall pixel and the cities, Epitaciolândia and Plácido de Castro, suggesting that the rainfall in that period had an influence in the diarrhea cases in these cities.However, the correlation is not as higher than Senador Guiomard and Acrelândia cities where the pixels with the highest values are located upstream the city.

correlation: water level data versus diarrheal diseases cases
The results presented in this section comprise the correlation values obtained from the water level data and the diarrheal diseases cases (Figure 4 a-c).
Diarrheal diseases recorded in the neighboring cities, Epitaciolândia and Brasiléia, show good correlation values with the Brasiléia gau-ging station (0.7 and 0.5 respectively, Figure 4 a).However, Xapuri and Rio Branco gauging station show low correlation values with the diseases case registered in the respectively cities.

Discussion rainfall estimates validation
Rainfall estimates obtained through remote sensing methods have been used due to its enormous advances in areas with difficult access and logistic (such as Amazonia).
The rainfall estimates validation performed for TRMM using the contingency table found a bias next to a unit, and it is less than what was found in Santos e Silva et al. 28 .
For the FAR index, the value obtained is reasonable when compared to the 25% approx.obtained in Santos e Silva et al. 28 but it was higher than 0.15 found in Scheel et al. 29 study which performed a similar validation process using 3B42-V6 in the Peruvian and Bolivian territory during the rainy season (DJF).The value obtained for H means that the estimation achieves 4/5 rainfall events and it is a very good proxy in qualitative terms being higher than the ~77% found in Scheel et al. 29 .Moffitt et al. 30 analyzing H, which authors named Probability of Detection index (POD), for the previous version of the TRMM product 3B42V6 (3h) found values of approximately 57% for rain gauges in the Ganges, Brahmaputra and Mehgna basins in Bangladesh.Using these values as a threshold it is assumed that the value obtained in this study is a reasonable match.Authors also found that there are more hits during the rainy season (monsoon) then on the dry season (no- monsoon).Due to the data scarcity, it was not possible to test this hypothesis in our study.The value obtained for POFD index means that there is a 50% percentage of a false event actually being a non-occurrence of rainfall.This value is more than twice of what was found in Moffitt et al. 30 .Authors attributed it to an inherited uncertainty of the satellite estimates.Improvements in the newest version of 3B42 (3h) algorithm are analyzed in the Zulkafli et al. 14 study as the reduction of the negative bias especially during the rainy season and probably due to the inclusion of more rain gauges in the validation process.It was reassuring for our purposes the accuracy measures being above 60%, the fact that it does not overestimate the number of events and a small false alarm ratio (less than 30%).
This specific TRMM product used in our analysis aggregates the best of all TRMM set of sensors and even data from other satellites although it is still a proxy from the real data (indirect measurement) and it has limitations.The convection processes in this region has characteristics as the weak horizontal fluxes and the fact that most of the momentum is vertical and happens during one-hour time-space 31 which are not properly accounted by the sensors and it may lead to over/underestimates.However, TRMM estimates provide a good dataset to analyze rainfall events in this area and its main advantages are due to their availability, spatial resolution, and the series length (15 yr.long).The estimated rainfall pattern in the Upper and Lower Acre is similar to the observed one, although the correlation value between both datasets is low (~0.4).Our hypothesis to the low correlation is the fact that the datasets were compared to each other directly, instead of using mean or accumulated values per municipality.No statistical technique or filters were applied in order to smooth possible failures in these series as the ones listed in Collischonn et al. 13 .
Although the correlation values are statistically low, we classified them as relevant or high when compared with the results presented in each map or graphic.

rainfall estimates versus Water level
The results for the correlation between rainfall estimates and water level presented a good correspondence despite the low values for the three cities analyzed.The seasonal analysis (not shown) did not improve the results shown here.A hypothesis that might explain this is that longer time intervals would smooth other factors that might be influencing the river seasonal behavior.
There is no correlation between instantaneous values for the rainfall estimates analysis whether it is with water level or diseases.However the results improve when it is applied 1 to 4 days time-lag, for the rainfall and water level analysis.After 5 day time-lag the correlation values tends to be very low.
Comparing the three gauging stations in the first analysis performed, rainfall estimates and water level, the best correlation values was found in Xapuri for 3 day time-lag (R = 0.35).
The results of the Brasiléia gauging station might be influenced by the exclusion of the basin area that does not belong to the Brazilian territory.
The rainfall near the station influences the water level for a few hours when it does so, because there are cases when these amounts of rain join the river downstream or even infiltrates the soil.The effects of flash floods with rapid response on the water level are not registered in the official records, because it usually lasts less than 24 hours.

Diarrheal diseases analysis
It was found an acceptable correlation between the hydrologic variables (rainfall and water level) and diarrheal diseases.It can be due to the short time series available, data gaps and due to the temporal interval in which the analysis were performed (e.g.diarrheal cases are monthly grouped).The result might improve if the diseases statistics were available in a higher frequency (e.g.weekly) which was also suggest by Curriero et al. 32 .
This study was not able to show the relationship during the rainy period between rainfall and water level and the diarrheal diseases.However, for Epitaciolândia e Plácido de Castro the correlation values obtained for rainfall and diarrheal disease suggest that the rainfall around the city could influence the disease series.Carlton et al. 33 also suggested that the highest incidence of diarrheal diseases occurs during the rainy season and that the most affected cities are the ones with low sanitation coverage.Torres et al. 34 assert that the increase in population in urban areas leads to an increase in the pressure and demands for the sanitation system.And although the improvements in the basic Health Attention helped to diminish the number of deaths caused by diarrheal diseases, the lack of adequate sanitation systems maintained the number of cases.According to ACRE 35 , Rio Branco as the state largest city pro-vides a median infrastructure and health services in the state, and concentrates most of the state's population.A study performed by Oliveira et al. 36 for Minas Gerais State, in the Southeast Brazilian region, showed that the increase in the coverage of the sanitation system combined with the actual coverage of the water system could reduce in almost 5% the fraction of diarrheal disease in children less than 5 years attributed to these two infrastructure characteristics.
There were no coherent values between rainfall and Brasiléia diarrheal diseases cases however high correlation values were found between the water level data in Brasiléia and diarrheal diseases cases in both Epitaciolândia (0.7) and Brasiléia (0.5).It can be due to the fact that the analysis with rainfall does not include the Bolivian part of the basin and on the other hand the water level data indirect includes the runoff provided by the excluded area or due to a different response time between the rainfall event and a possible impact on infant's health.Another possibility to explain these findings may be the distance from the place of residence and health services.However, we could not get data in time for analysis.

conclusion
This study showed that the rainfall satellite estimates is a good indicator of the rainfall distribution in the Acre basin.It allows the use of this data series to support local public policies and actions of various groups such as civil defense, health professionals to better plan and prioritize the financial and human resources available, and mitigate the impact of floods to the local population.Moreover, these estimate also allowed an analysis of the influence of rainfall upstream in the water level in three gauging stations in the Acre basin.
It is a first approach to aggregate satellite information on water level for children ´s diarrheal diseases analyses in the Amazon basin.It is necessary to deepen statistical analysis, which could allow a better understanding of these processes.Knowing the response time between a rainfall event and its impact on the river and consequences to the population's health are important issues when it comes to planning response and early warning in case of floods or even extreme rainfall events.Recent studies using climate projections indicate an increase in extreme rainfall events in Amazon 37,38 and on the discharge of the Amazonian rivers 39 .This type of study can contribute to the projections of health impacts associated with the flooding of the rivers of Amazonia, and consequently help government entities in the strategies of early actions to the local people.Such analysis can improve the preparedness and help the government to adapt its strategies to help the population.The location, strength and time-lag of the spatial correlation calculated here indicate that rainfall is not the only environmental variable explaining the behavior of water level or the occurrence of infant diarrheal diseases cases.
Finally, considering the possible relationship between diarrheal diseases and rainfall behaviour and the fact that other states in the Brazil North region have worse diarrheal diseases ratios there is a scope to replicate this methodology in other contexts within the Brazilian Amazon.collaborations PAM Fonseca participated in all the phases of the study going through bibliographic revision, data gathering, analysis and the process of writing the manuscript.SS Hacon was responsible for the project supervising and orientation, methodological approach, data analysis, discussion and the final writing revision.VL Reis was responsible for the project supervising, data gathering and review process.D Costa and IF Brown participated in the data gathering, analysis and writing and review process.

aknowledgments
The authors thank to the financial support from the National Council for Scientific and Technological Development (CNPq), FINEP project Rede CLIMA 2; Also to the Acre Government and PULSE-Brasil project NERC.We also would like to thank ENSP/FIOCRUZ (National School of Public Health / Oswaldo Cruz Foundation).Special thanks to Dr. Alejandro Duarte Fonseca (AcreBioClima Project) for maintaining such an important project for many years and thus providing important contribution for the society.To Dr. Elisa Armijos from PPG-CLIAMB/UEA-IN-PA for the revision support.To Diego Viana from the Acre State Health Secretary for the technical support.

Figure 1 .
Figure 1.A -Municipalities within the Acre River Basin using the code number available in the shape file provided by UCEGEO/AC.The municipalities are part of 2 Acre Regions: Alto Acre (Upper Basin):16 -Assis Brasil, 17 -Brasiléia, 18 -Epitaciolândia, 19 -Xapuri; and Baixo Acre (Lower Basin):4 -Rio Branco, 20 -Capixaba, 8 -Bujari, 5 -Porto Acre, 2 -Senador Guiomard, 21 -Plácido de Castro, 22 -Acrelândia.Black dots represent the geographical locations where the water level series were obtained.Riozinho do Rola and Xapuri sub-basins are delimited in order to highlight the proportion and importance each one has when compared to the total extension of Acre Basin.B -Accumulated rainfall for the NOV-APR semester from 2006 to 2013 split between observed, estimated, and spatially between Upper and Lower Acre; C -Average accumulated rainfall in Upper and Lower Acre regions considering the geographical locations of the rain gauges.Values from NOV 2006 to APR 2013; D -Daily mean of the water level for each municipality, the values were calculated for specific year intervals, according to what was available for each location; E -Incidence rate for infant diarrhea diseases in Acre River basin municipalities from Jan/2000 to Dec/2012.
(a + d) n a = rainfall events both observed and estimated d = rainfall events neither observed nor estimated n = number of days analyzed (n), or sample size.CSI = Equation 2: Critical success index (CSI)Where, b = total rainfall events observed and not estimated c = rainfall events only observed.Bias =Equation 3: Ratio of the estimated and precipitated rainfall events.Probability of false detection.

Figure 2
Figure 2, maps C and D (R ≈ 0.35) and it represents the critical time-lags for the water level in Xapuri to respond to rainfall events in the upstream basin.Rio Branco gauging station shows the strongest signals for 4 and 5 day lag, as shown in Figure2, maps E and F (R ≈ 0.31).The pixels located upstream the measurement point have stronger correlation coefficients as the time-lag increases.Short time-lags have low correlation values indicating that rainfall relates with water level in this gauging station more frequently after a minimum time-lag of 3 days.From Figure2e, a pixel in the northwest part of the basin highlights an area of higher influence on the water levels (light grey).This particular pixel corresponds to Espalha River, a small tri-

Figure 2 .
Figure 2. Best correlation values between rainfall estimates and water level taken in Brasiléia, Xapuri and Rio Branco.Check the white dots to identify where the river measurements where obtained.

Figure 3 .
Figure 3. Correlation results obtained from rainfall estimates for all the Acre river basin and the infant diarrhea diseases cases for all municipalities in the Acre basin (A-G) for the Dec-Jan-Feb.

Figure 4 .
Figure 4. Correlation between Acre River water level for Brasiléia (a), Xapuri (b) and Rio Branco (c) and the infant diarrhea in children in cities that comprises Upper and Lower Acre regions.Correlation values where obtained using maximum, medium and minimum monthly water level for the DJF trimester of each dataset.

table 1 .
Rain gauges with data available for the NOV-APR semester to2006-2013.