Fuzzy model to estimate the number of hospitalizations for asthma and pneumonia under the effects of air pollution

ABSTRACT OBJECTIVE Predict the number of hospitalizations for asthma and pneumonia associated with exposure to air pollutants in the city of São José dos Campos, São Paulo State. METHODS This is a computational model using fuzzy logic based on Mamdani’s inference method. For the fuzzification of the input variables of particulate matter, ozone, sulfur dioxide and apparent temperature, we considered two relevancy functions for each variable with the linguistic approach: good and bad. For the output variable number of hospitalizations for asthma and pneumonia, we considered five relevancy functions: very low, low, medium, high and very high. DATASUS was our source for the number of hospitalizations in the year 2007 and the result provided by the model was correlated with the actual data of hospitalization with lag from zero to two days. The accuracy of the model was estimated by the ROC curve for each pollutant and in those lags. RESULTS In the year of 2007, 1,710 hospitalizations by pneumonia and asthma were recorded in São José dos Campos, State of São Paulo, with a daily average of 4.9 hospitalizations (SD = 2.9). The model output data showed positive and significant correlation (r = 0.38) with the actual data; the accuracies evaluated for the model were higher for sulfur dioxide in lag 0 and 2 and for particulate matter in lag 1. CONCLUSIONS Fuzzy modeling proved accurate for the pollutant exposure effects and hospitalization for pneumonia and asthma approach.


INTRODUCTION
Air pollution is a serious environmental issue, given its impact on human health, especially cardiovascular and respiratory systems 1,2 .
In Brazil, especially in large urban centers, exposure to pollutants like particulate matter (PM 10 ), sulfur dioxide (SO 2 ), ozone (O3), carbon monoxide (CO) and nitrogen oxides (NOx) is associated with hospitalizations for respiratory diseases such as asthma and pneumonia, thanks to the pollutants easy reach to the respiratory tract 3,16 .
Currently, this issue extends beyond major urban centers 7,11 and is affecting cities of medium and small size 1,6,8 . Exposure to air pollutants represents the high financial cost for the public network. The cost of 900,000 hospitalizations for pneumonia and asthma in 2011 reached US$350 million in Brazil, US$70 million in São Paulo State, with 150,000 hospitalizations, and US$800,000 in the city of São José dos Campos, with 1,900 hospitalizations a , representing a public health problem.
The statistical techniques of logistic regression and Poisson's regression (Generalized Linear Models or Generalized Additive Models) are often used to estimate the chance or risk of hospitalization or death for respiratory diseases 9 .
A new form of epidemiological data analysis in public health is fuzzy logic. The fuzzy set theory was introduced by Lotfi A. Zadeh 23 , in 1965, and can work with the vague aspect of information particular to human understanding and very common in the medical field, in which descriptions of diseases often comprise language terms that are inevitably vague, such as fever (high or low) 15 . Unlike the classical set theory, in which an element belongs or does not belong to a set, in the theory of fuzzy sets an element may belong to more than one set with different degrees of relevance (varying between zero and one). These fuzzy sets are represented by the relevancy function, whose determination depends on the specialist's individual perception and the data at hand 17 .
The ability to deal with linguistic terms can explain the increase in the number of studies that use fuzzy logic in biomedicine problems. In fact, the fuzzy logic theory has become an important approach in diagnosis systems, prognosis, forecasting models, medical treatment and, more recently, in epidemiology and public health 13,15,17,18,20,22,23,b .
This study's objective was to develop a computational model using fuzzy logic to estimate the influence of exposure to air pollutants in the number of hospital admissions for asthma and pneumonia.

METHODS
This is a computational model using fuzzy logic to estimate the number of hospitalizations for pneumonia and bronchial asthma per the concentrations of the pollutants particulate matter (PM 10 ), ozone (O 3 ) and sulfur dioxide (SO 2 ) and the apparent temperature (TEMPap), calculated by considering ambient temperature and relative air humidity 2 .
The environmental pollutants and climatic variables data in this study were obtained from the Environmental Company of the State of São Paulo (CETESB), which has a measuring station in the city of São José dos Campos. The data of the number of hospitalizations for pneumonia and bronchial asthma (ICD-10: J12 to J18 and J45) in individuals of all ages residing in São José dos Campos, in the period from 1/1/2007 to 12 Alto Vale do Paraíba, 600 m above sea level, has wet weather and tropical altitude, located between São Paulo and Rio de Janeiro -the two largest cities in Brazil; is cut through by the Via Dutra, the most important highway in the country with heavy traffic of buses and trucks. Its population is estimated to be around 650,000 inhabitants.
The model was developed with the help of a specialist who created two functions of relevance for the input variables: PM 10 -good and bad; O 3 -good and bad; SO 2 -good and bad; and TEMPap -good and bad. These fuzzy sets were drawn based on the data obtained from CETESB c .
The output variable was the number of hospitalizations for asthma and pneumonia and its five relevance functions we classified as follows: very low, low, medium, high and very high. The fuzzy linguistic model is a rule-based inference system using fuzzy set theory to address the phenomenon. Its structure includes four components: • The fuzzifier, which transforms real inputs (also known as crisp) into fuzzy values; • The rule base, which defines the connection between the system's inputs and outputs.
A fuzzy rule has the following form: if precedent, then consequent, in which the precedent can be composed of one or more fuzzy sets connected by fuzzy operators. And the consequent represents the fuzzy values of output variables; • The inference system, which evaluates all rules, checks which have been activated (with degree of relevance greater than zero) and combines the resulting weights of all the rules enabled on a single output (in this study, we used the Mamdani's inference system); and • The defuzzifier that performs the reverse process of the fuzzifier, that is, it transforms the fuzzy output into an actual value (crisp) 21 .
When performing the combination of all possible entries, it was possible to develop 16 rules resulting from the combination of the four entries with two functions of relevance each (2 × 2 × 2 × 2), drawn up with the aid of an expert. For example, two combinations could be: Through a routine (toolbox fuzzy) that is part of the program MATLAB 21 , we obtained the numerical output resulting from the fuzzy model, providing the number of hospitalizations for each combination of the database entry. After obtaining the model's result, we performed a Pearson correlation with actual admissions data, using up to two days of lag (lag 2) because the display effect can manifest on the same day (lag 0) or on subsequent days. The accuracy, along with its 95% confidence interval was estimated using the ROC curve.  Twelve times ozone exceeded the value used by the National Council for the Environment (CONAMA) d . Particulate matter and sulfur dioxide did not exceed those limits.
Relevancy functions of input variables: PM 10 ; O 3 ; SO 2 and TEMPap are shown in Figure 1 (A-D). The output variable: number of hospitalizations for asthma and pneumonia with its five relevancy functions are shown in Figure 2.
The results obtained by the ROC curve are presented in Table 2. The best result was obtained for the zero-day lag (lag 0), PM 10    The Pearson correlation coefficient between the output of the fuzzy model and the actual data were 0.38 for lag 0; 0.36 for lag 1 and 0.30 for lag 2 (p < 0.01); even though the values are not high, they were significant.

DISCUSSION
This article presents the development of a computational model using fuzzy logic to estimate the number of hospitalizations for asthma and pneumonia associated with air pollutants in a midsize industrial town. The results showed good accuracy when predicting the number of hospitalizations when exposure occurred on the same day and up to two days later. Additionally, we saw the acute effect the exposure to pollutants has in hospitalizations.
This study showed a maximum concentration of PM 10 of 89 μg/m 3 , which is lower than the air quality advocated by CETESB c , which is 120 μg/m 3 . According to CETESB, PM 10 indexes between 51 and 100 μg/m 3 classify an air of moderate quality and can cause symptoms such as a dry cough and fatigue in patients with respiratory diseases.
In the study by Arbex et al. 2 , PM 10 may lead to airway irritation, inflammation, increased bronchial reactivity and decreased mucociliary activity, and its consequences are an increase in asthma attacks and respiratory infections.
In research conducted by Gouveia et al. 12 in the city of São Paulo, State of São Paulo, PM 10 presented an average of 54.5 μg/m 3 , a value well above this study's average, which was 26 μg/m 3 . We found that, with a 10 μg/m 3 increase in the concentration of particulate matter, there was an increase of approximately 5% in the number of hospitalizations for asthma in children. Similarly, Nascimento et al. 16 observed that, with an increase of 24.7 μg/m 3 on the average concentration of PM 10 , there was an increase of 9.8% in  In the study performed with data from 2004 and 2005 in the same city 1 , the average was 25.2 μg/m 3 , a value close to the one found in this study, and hospitalizations for asthma were associated with exposure to PM 10 .
As for O 3 , the average value found in this study was o 91 μg/m 3 for the period of 2007, close to the value found by Amâncio and Nascimento 1 in São José dos Campos with an average of 74.3 μg/m 3 , by Gouveia et al. 12 , with an average of 71.8 μg/m 3 , and greater than the one observed by Negrisoli and Nascimento 19 , with an average of 37.1 μg/m 3 . On 12 days, the O 3 exceeded the values of acceptable levels (up to 160 μg/m 3 ) at CONAMA d . In CETESB' s c report, the inappropriate values for O 3 are between 180μg/m 3 and 200 μg/m 3 and can aggravate asthma symptoms in children with respiratory disease. In the general population, it can lead to the following symptoms: a dry cough, fatigue and burning sensation in the nose, throat, and eyes. The concentration of O 3 had a significant growth in the last year in the city of São José dos Campos, per CETESB's latest report c .
The concentration of SO 2 had an average of 3.8 μg/m 3 , which is within the values tolerated by the World Health Organization (WHO), which accepts a maximum exposure level of up to 20 μg/m 3 for 24 hours. This value was like the one found previously (4.6 μg/m 3 ) in the same city 1 , with data collected between 2004 and 2005, and another study, also in São José dos Campos, averaging 6.2 μg/m 3 , with data from 2000 and 2001 16 .
In a research conducted in the city of São Paulo, Gouveia and Fletcher 10 found an average concentration of SO 2 of 17.71 μg/m 3 , which differs from the value found in this study. This difference can be explained by the larger fleet of vehicles in São Paulo, a major source of this pollutant.
The pollutants cited in this article and their magnitudes are described in national articles about this subject 1-7 and it is important to note that the data of hospital admissions are related to hospitalization in the public health system. Even at concentrations considered safe, the pollutants can cause adverse health effects, especially on the cardiovascular and respiratory systems. This study differs from other studies that estimate risks or chances of hospitalizations caused by exposure to air pollutants using logistic regression, retrospective studies or Poisson's regression 1,3,5,6 . In these models, fuzzy logic has the advantage of dealing with the uncertainty of information present in the meanings of words, facilitating dialogue between health care professionals and computational experts.
The fuzzy model showed itself to be very satisfactory by associating exposure to pollutants with the number of hospitalizations when compared with real outputs, with Pearson correlation coefficient of 0.38. This methodology was used recently in the same city to analyze the average time of hospitalization arising from pneumonia 18 . In the national literature, there are also articles with fuzzy application in establishing the risk of neonatal death 15,17 and neonatal resuscitation prediction 20 .
In epidemiological studies, it is common to find an association between the concentrations of air pollutants and the health effects on the next day, after two days or even after a week. Researchers generally adjust the model for different arrangements of the same database with lags. In time-series studies, lags of one to seven days are often used 4,14 .
The data obtained by the ROC curve of the model (Table 2) showed excellent accuracy and PM 10 , O 3 , and SO 2 had a good performance with lag zero, which allows us to predict the effects of these pollutants on the same day that exposure occurs. The best result was for the pollutant SO 2 at zero lag, with great accuracy and area under the curve of 95%. Thus, we can conclude that the more PM 10 and SO 2 are present in the atmosphere, the more hospitalizations of patients with asthma and pneumonia will occur.
The actual data regarding the number of hospitalizations ranged from zero to 16 and those obtained by the fuzzy model ranged from 0.7 to 13.9. This difference between minimum and maximum is due to the defuzzification method of the fuzzy inference system, in that the result is obtained by calculating the area center. Therefore, it would be mathematically difficult to obtain the area center around the ends of the universe of discourse of the output variable.
To improve the model, we can include more relevance functions in the input variables, which would involve a larger number of rules; or use different formats of relevance functions, such as Gaussian or triangular. The concentrations of pollutants are considered homogeneous for the implementation of this approach, which can be considered a limitation, since concentrations in other areas of the same city may differ from those recorded near the monitoring station.
This research is a low-cost financial tool and can be presented in a specific computer program (expert system) for this purpose, and does not require the opinion of experts. The model can be implemented in public health systems and can serve as an important instrument for prevention and decision-making regarding changes in the level of pollutants. It can also be applied in any locations where there are available data on pollutants and climatic conditions.