Establishing the risk of neonatal mortality using a fuzzy predictive model

The objective of this study was to develop a fuzzy model to estimate the possibility of neonatal mortality. A computing model was built, based on the fuzziness of the following variables: newborn birth weight, gestational age at delivery, Apgar score, and previous report of stillbirth. The inference used was Mamdani’s method and the output was the risk of neonatal death given as a percentage. 24 rules were created according to the inputs. The validation model used a real data file with records from a Brazilian city. The receiver operating characteristic (ROC) curve was used to estimate the accuracy of the model, while average risks were compared using the Student t test. MATLAB 6.5 software was used to build the model. The average risks were smaller in survivor newborn (p < 0.001). The accuracy of the model was 0.90. The higher accuracy occurred with risk below 25%, corresponding to 0.70 in respect to sensitivity, 0.98 specificity, 0.99 negative predictive value and 0.22 positive predictive value. The model showed a good accuracy, as well as a good negative predictive value and could be used in general hospitals. Neonatal Mortality; Fuzzy Logic; Medical Informatics Computing; Risk Factors; Predictive Value of Tests Introduction Uncertainty, vagueness, and imprecision are very common in medicine and in areas such as fever (high or low) and weight (high or low), where the best and most useful descriptions of diseases often involve terms that are unavoidably vague. Fuzzy set theory has been developed to deal with the concept of partial true values, ranging from completely true to completely false, and has become a powerful tool for dealing with imprecision and uncertainty, aiming at tractability, robustness and low-cost solutions for real-world problems. These features and the ability to deal with linguistic terms could explain the increasing number of works applying Fuzzy Logic to problems in medicine 1,2. In fact, the theory of Fuzzy Sets has become an important mathematical approach in diagnosis systems 3, and, more recently, in epidemiology and public health 4. For example, a model using birth weight and gestational age was used to estimate neonatal death risk 5. Neonatal mortality is defined as a death that occurred up until the 28th day of life and it is a very important population health indicator. This indicator provides information on social welfare, and ethical and political aspects of a population under certain conditions. Low birth weight, those who are born weighing less than 2,500g, preterm newborns, children who are born before having completed 37 weeks of gestation 6, newborn seriARTIGO ARTICLE Nascimento LFC et al. 2044 Cad. Saúde Pública, Rio de Janeiro, 25(9):2043-2052, set, 2009 ously depressed when the Apgar score is below seven and previous reports of stillbirth are important causes of neonatal mortality. The incidences of low birth weight and preterm newborn in Brazil are around 7% (Department for Informatics at the Unified National Health System. http://tabnet.datasus.gov.br/ cgi/tabcgi.exe?sinasc/cnv/nvuf.def, accessed on 14/Jun/2007). Neonatal mortality in the State of Sao Paulo, the most industrialized state in Brazil, was 9.89/1,000 livebirths in 2004 (Department for Informatics at the Unified National Health System. http://tabnet.datasus.gov.br/cgi/tabcgi. exe?sim/cnv/infuf.def, accessed on 14/Jun/2007). The estimate of risk of neonatal death can provide important information to pediatricians, especially to neonatal intensive care physicians, with respect to the attention a newborn requires. It is evident that the care provided to a newborn infant will differ depending on the hospital and its location. In fairly small hospitals it is common for there to be no pediatrician present at the time of birth, and other professionals are in charge of evaluating the newborn 5. To estimate the risk of neonatal death, a Regression Model using dichotomous independent variables such as Yes or No, Present or Absent has been applied 7. Fuzzy Logic allows assigning, for instance, a newborn with birth weight of 2,350g to a fuzzy subset low birth weight with 0.63 membership degree and to a normal birth weight fuzzy subset with 0.25 membership degree, taking into account the inherent uncertainties of this record. In fact, a newborn weighing 2,490g at birth and another weighing 2,510g at birth, who are classically categorized as low birth weight and normal birth weight respectively, do not show significant differences across biological, anatomical and physiological aspects. In the fuzzy approach each element may be compatible with several categories, with different membership degrees. The advantage of the fuzzy theory is to consider an even and more realistic classification of the children relating to the two variables assumed 5. The theory of fuzzy sets was introduced by Lotfi A. Zadeh in the 1960s as a means to model the uncertainty within natural language and introduced the concept of vagueness. According to this alternative view, uncertainty is considered essential to science. To the reader who wishes to learn more about fuzzy logic theory the book by Yen & Langari 8 is recommended. Thus, a theoretical fuzzy linguistic model is presented in the study, which is a low cost program able to evaluate more appropriately the risk of neonatal death based on birth weight, gestational age, Apgar score and previous report of stillbirth. Methods A computational model is used with a fuzzy linguistic model to evaluate the risk of neonatal death. This model involves four previously named inputs: birth weight, gestational age, Apgar score and previous report of stillbirth. The model was developed from one expert knowledge, who elaborated three fuzzy sets to the variable birth weight: very low birth weight, low birth weight, normal birth weight; and two fuzzy sets to the variable gestational age: preterm and term; and two fuzzy sets to the variable Apgar Score: Low, when the values were below seven, and high when the values are above eight; and two fuzzy sets to the variable previous report of stillbirth: few if there were zero or one stillbirth and many if there were two or more stillbirth. The output is the death risk with five linguistic labels: very high, high, middle high, middle and low. These fuzzy sets were built by fuzzying the classical pediatrics classification. Situations such as small to gestational age, adequate to gestational age and large to gestational age were not considered in this study. A fuzzy linguistic model is a rule-based system that uses fuzzy sets theory to address the issue. Its basic structure includes four main components, as shown in Figure 1: A • fuzzifier, which translates crisp inputs (classical numbers) into fuzzy values; An • inference engine that applies a fuzzy reasoning mechanism to obtain a fuzzy output (in the case of Mamdani inference); A • knowledge base, which contains both a set of fuzzy rules and a set of membership functions representing the fuzzy sets of the linguistic variable; and A • defuzzifier, which translates the fuzzy output into a crisp value. The decision process is performed by the inference engine using the rules contained in the rule base. These fuzzy rules define the connection between fuzzy input and output. A fuzzy rule has a form: if antecedent then consequent, where antecedent is a fuzzy expression composed of one or more fuzzy sets connected by fuzzy operators, and consequent is an expression that assigns fuzzy values to the output variables. The inference process evaluates all rules in the rule base and combines the weighted consequents of all relevant rules into a single output fuzzy set (Mamdani’s model). The fuzzy output set may then be replaced by a “crisp” output value obtained by a process called defuzzification 8. The base rules are give in Table 1. When a newborn is very low birth weight and is preterm and Apgar is low and previous report of stillbirth FUZZY LOGIC AND NEONATAL MORTALITY 2045 Cad. Saúde Pública, Rio de Janeiro, 25(9):2043-2052, set, 2009 is few, then the risk of neonatal death is very high as shown by rule 1. Note that the sequence of input is: birth weight; gestational age; Apgar score; previous report of stillbirth; and the output is risk of neonatal death, after the step named defuzzification. Centroid was the defuzzification method used in this study and the risk of neonatal death was estimated as a percentage. Note that, by combining all possible inputs, it is possible to build 24 rules. The procedure of the fuzzy linguistic model, given four of the above inputs for any child, consists of calculating the membership degree of these values in all fuzzy sets of birth weight, gestational age, Apgar Score and previous report of stillbirth. Next, the risk of neonatal death is determined by inference of the fuzzy rule set, using Mamdani’s inference and defuzzification of the fuzzy output. The fuzzy sets related to the linguistic variables birth weight, Apgar score, previous report of stillbirth and gestational age are presented in Figure 2. This model was validated by using a real data set which contains the same variables of the defined fuzzy set. The real data set was taken from São José dos Campos, a mid-sized city in the Southeast of Brazil, in 2003. This data file contained information from the Brazilian Birth Certificate, an official document necessary for civil registration. This data file contained information about the newborn’s situation up to 28 days of life – dead or alive. The accuracy of the model was estimated by the ROC (receiver operating characteristic) curve and the risk values were evaluated using the Student t test. The Median test or Mann-Whitney test were used if the value of the risk did not have a normal distribution. The MATLAB software (MathWorks, Natick, USA) was used to perform the simulation.


Introduction
Uncertainty, vagueness, and imprecision are very common in medicine and in areas such as fever (high or low) and weight (high or low), where the best and most useful descriptions of diseases often involve terms that are unavoidably vague.
Fuzzy set theory has been developed to deal with the concept of partial true values, ranging from completely true to completely false, and has become a powerful tool for dealing with imprecision and uncertainty, aiming at tractability, robustness and low-cost solutions for real-world problems.
These features and the ability to deal with linguistic terms could explain the increasing number of works applying Fuzzy Logic to problems in medicine 1,2 .In fact, the theory of Fuzzy Sets has become an important mathematical approach in diagnosis systems 3 , and, more recently, in epidemiology and public health 4 .For example, a model using birth weight and gestational age was used to estimate neonatal death risk 5 .
Neonatal mortality is defined as a death that occurred up until the 28 th day of life and it is a very important population health indicator.This indicator provides information on social welfare, and ethical and political aspects of a population under certain conditions.Low birth weight, those who are born weighing less than 2,500g, preterm newborns, children who are born before having completed 37 weeks of gestation 6 , newborn seri-ously depressed when the Apgar score is below seven and previous reports of stillbirth are important causes of neonatal mortality.
The estimate of risk of neonatal death can provide important information to pediatricians, especially to neonatal intensive care physicians, with respect to the attention a newborn requires.It is evident that the care provided to a newborn infant will differ depending on the hospital and its location.In fairly small hospitals it is common for there to be no pediatrician present at the time of birth, and other professionals are in charge of evaluating the newborn 5 .
To estimate the risk of neonatal death, a Regression Model using dichotomous independent variables such as Yes or No, Present or Absent has been applied 7 .Fuzzy Logic allows assigning, for instance, a newborn with birth weight of 2,350g to a fuzzy subset low birth weight with 0.63 membership degree and to a normal birth weight fuzzy subset with 0.25 membership degree, taking into account the inherent uncertainties of this record.In fact, a newborn weighing 2,490g at birth and another weighing 2,510g at birth, who are classically categorized as low birth weight and normal birth weight respectively, do not show significant differences across biological, anatomical and physiological aspects.In the fuzzy approach each element may be compatible with several categories, with different membership degrees.The advantage of the fuzzy theory is to consider an even and more realistic classification of the children relating to the two variables assumed 5 .
The theory of fuzzy sets was introduced by Lotfi A. Zadeh in the 1960s as a means to model the uncertainty within natural language and introduced the concept of vagueness.According to this alternative view, uncertainty is considered essential to science.To the reader who wishes to learn more about fuzzy logic theory the book by Yen & Langari 8 is recommended.
Thus, a theoretical fuzzy linguistic model is presented in the study, which is a low cost program able to evaluate more appropriately the risk of neonatal death based on birth weight, gestational age, Apgar score and previous report of stillbirth.

Methods
A computational model is used with a fuzzy linguistic model to evaluate the risk of neonatal death.This model involves four previously named inputs: birth weight, gestational age, Apgar score and previous report of stillbirth.The model was developed from one expert knowledge, who elaborated three fuzzy sets to the variable birth weight: very low birth weight, low birth weight, normal birth weight; and two fuzzy sets to the variable gestational age: preterm and term; and two fuzzy sets to the variable Apgar Score: Low, when the values were below seven, and high when the values are above eight; and two fuzzy sets to the variable previous report of stillbirth: few if there were zero or one stillbirth and many if there were two or more stillbirth.The output is the death risk with five linguistic labels: very high, high, middle high, middle and low.These fuzzy sets were built by fuzzying the classical pediatrics classification.Situations such as small to gestational age, adequate to gestational age and large to gestational age were not considered in this study.
A fuzzy linguistic model is a rule-based system that uses fuzzy sets theory to address the issue.Its basic structure includes four main components, as shown in Figure 1: A • fuzzifier, which translates crisp inputs (classical numbers) into fuzzy values; An • inference engine that applies a fuzzy reasoning mechanism to obtain a fuzzy output (in the case of Mamdani inference); A • knowledge base, which contains both a set of fuzzy rules and a set of membership functions representing the fuzzy sets of the linguistic variable; and A • defuzzifier, which translates the fuzzy output into a crisp value.
The decision process is performed by the inference engine using the rules contained in the rule base.These fuzzy rules define the connection between fuzzy input and output.A fuzzy rule has a form: if antecedent then consequent, where antecedent is a fuzzy expression composed of one or more fuzzy sets connected by fuzzy operators, and consequent is an expression that assigns fuzzy values to the output variables.The inference process evaluates all rules in the rule base and combines the weighted consequents of all relevant rules into a single output fuzzy set (Mamdani's model).The fuzzy output set may then be replaced by a "crisp" output value obtained by a process called defuzzification 8 .
The base rules are give in Table 1.When a newborn is very low birth weight and is preterm and Apgar is low and previous report of stillbirth Cad.Saúde Pública, Rio de Janeiro, 25(9):2043-2052, set, 2009 is few, then the risk of neonatal death is very high as shown by rule 1.Note that the sequence of input is: birth weight; gestational age; Apgar score; previous report of stillbirth; and the output is risk of neonatal death, after the step named defuzzification.Centroid was the defuzzification method used in this study and the risk of neonatal death was estimated as a percentage.
Note that, by combining all possible inputs, it is possible to build 24 rules.The procedure of the fuzzy linguistic model, given four of the above inputs for any child, consists of calculating the membership degree of these values in all fuzzy sets of birth weight, gestational age, Apgar Score and previous report of stillbirth.Next, the risk of neonatal death is determined by inference of the fuzzy rule set, using Mamdani's inference and defuzzification of the fuzzy output.
The fuzzy sets related to the linguistic variables birth weight, Apgar score, previous report of stillbirth and gestational age are presented in Figure 2.
This model was validated by using a real data set which contains the same variables of the defined fuzzy set.The real data set was taken from São José dos Campos, a mid-sized city in the Southeast of Brazil, in 2003.This data file contained information from the Brazilian Birth Certificate, an official document necessary for civil registration.This data file contained information about the newborn's situation up to 28 days of life -dead or alive.The accuracy of the model was estimated by the ROC (receiver operating characteristic) curve and the risk values were evaluated using the Student t test.The Median test or Mann-Whitney test were used if the value of the risk did not have a normal distribution.The MATLAB software (MathWorks, Natick, USA) was used to perform the simulation.

Results
There were 58 neonatal deaths in 1,351 records.The mean value of the risk values was 9.85% (SD = 14.02), the range of these values was 4.67-90.33%and the median value was 4.67%.The risk values do not have a normal distribution by using a Kolmogorov-Smirnov Test (z = 5.47, p < 0.001).The Mann Whitney resulted in a mean rank of 1,194.01 to neonatal death and 652.76 to live newborn (z = -14.79,p < 0.001).The median test resulted in 49 neonatal deaths with risk value above the median (4.67%) and 1,071 live newborns with risk value equal to or below the median (χ 2 = 152.7,p < 0.001).
Figure 3 shows the membership functions of the output variable risk of neonatal death.The surface of the neonatal death risk using the gestational age and birth weight (in grams) and Apgar score and birth weight (in grams) are shown in Figure 4. * Where (1) is the weight of each rule that can range from 0 to 1.
It can be noted in this graph that the risk of neonatal death decreases monotonically when birth weight or gestational age increases, as expected, such as higher Apgar score (Apgar score vs. birth weight).
In order to validate the computational model created, six dates were taken from the real data set with the following inputs: birth weight; gestational age; Apgar score; previous report of stillbirth.The output (risk of neonatal death) was given by the model.
Consider, for example, a newborn with birth weight of 3,500g, gestational age of 38 weeks, Apgar score of 5 and previous report of stillbirth of 0. With these four antecedents the following membership functions were activated: normal birth weight for the variable birth weight; term to the variable gestational age, low to the variable Apgar score, few to the variable previous report of stillbirth.The rule 21 was activated and the output variable activated was middle.After the defuzzification through the method centroid the result of the system (risk) is 25%.
Below In the first two cases both newborns survived.
Accuracy is higher when risk is below 25%, corresponding to 0.70 in respect to sensitivity, 0.98 specificity, 0.99 negative predictive value and 0.22 positive predictive value.Considering 4.7% risk values, we obtained 0.82 in respect to sensitivity, 0.82 specificity, 0.99 negative predictive value and 0.16 positive predictive value.The ROC curve is shown in Figure 5; the area under the curve is 0.90 (95%CI: 0.84-0.96)(p < 0.001).

Discussion
In this study, a fuzzy linguistic model to evaluate the risk of neonatal death based on birth weight, gestational age, Apgar score and previous report of stillbirth was proposed.This study is not an epidemiological study about neonatal mortality; it aimed to build a computational predictive model by using fuzzy logic.
Neonatal mortality is a main component of childhood mortality (SUS Information Depart- ment.http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sim/cnv/infuf.def,accessed on 14/Jun/2007).The means of identifying newborns with high risk to neonatal mortality can offer information to physicians who attend these newborns to take actions and prevent devastating outcomes.There are several methods to estimate the risk of neonatal death.The most commonly used methods are Pediatric Risk of Mortality (PRISM) 9 , the Score for Neonatal Acute Physiology (SNAP) 10 and the Clinical Risk for Index Baby (CRIB) 11 .
These scores use several variables and several measures of blood analysis while newborns are interned in neonatal intensive care units.Besides, the accuracies obtained by the ROC curve of these studies were 0.90 in respect to CRIB and 0.92 in respect to PRISM.Furthermore, other predictive models need a considerable number of records to establish an association between the outcome, neonatal death, and determinant variables, such as birth weight, Apgar score, previous report of stillbirth and gestational age, which is not necessary in the fuzzy model.Other approaches like artificial neural networks or neurofuzzy need records to train, check and validate the model.The model presented here provided good results as shown in the ROC curve.
The advantage of the risk estimator presented here is that model values cannot change over time, which is not true for experts' opinions.In fact, the experts could provide different values for death risk under the same conditions, depending on their positive or negative feelings and also from different geographic locations.It is common to get different answers from experts for the same question in a week's time.In this sense, the model presented here could offer a standardization of the classification process.On the contrary, our model did not use several blood analyses as is the case for PRISM, SNAP and CRIB.
In addition, this model prevents the variability in the analysis of newborn conditions provided by different health professionals, which could yield inequalities in the treatment.Besides, the fuzzy model is very simple and involves low costs in terms of computing, making it an easy and inexpensive option, factors that are particularly relevant in developing and poor countries.
On the other hand, it is not possible to compare this model with other predictive models because the fuzzy model does not use blood analyses and current models such as PRISM, SNAP or CRIB do not use the fuzzy variables.
In cities where there are no experts available, the model can help understanding and evaluating the risk of neonatal death based only on information regarding the birth weight Apgar score, previous report of stillbirth and gestational age without the need for laboratory tests and the value obtained immediately after the birth.This is available even in very modest conditions.
A similar model was developed based only on expert opinions with agreements 4 .
On the other hand, it is important to bear in mind that the number of fuzzy rules grows exponentially and this can impair the model's performance.Besides, the inclusion of new variables does not guarantee the improvement and robustness of the model.
The application of fuzzy sets theory in medicine and, particularly, in pediatrics, is a new area of research.Nevertheless, this approach has provided promising results in several medical applications, proposing a paradigmatic shift in health sciences 2,12 .The possibility of building a computational interface makes this fuzzy model a promising and useful predictive tool.

Contributors
All authors participated equally in the study.

Figure 1 Fuzzy
Figure 1Fuzzy knowledge basic structure.

Figure 5 ROC
Figure5ROC curve in respect to fuzzy model.