Mortality prediction model using data from the Hospital Information System

OBJECTIVE: To develop a hospital mortality prediction model based on data from the Hospital Information System of the Brazilian National Health System. METHODS: This was a cross-sectional study using data from 453,515 authorizations for hospital admission relating to 332 hospitals in Rio Grande do Sul, Southern Brazil in the year 2005. From the ratio between observed and expected deaths, the hospitals were ranked in an adjusted manner, and this was compared with the crude ranking of the mortality rate. Logistic regression was used to develop a predictive model for the likelihood of hospital mortality according to sex, age, diagnosis and use of an intensive care unit. Confi dence intervals (95%) were obtained for the 206 hospitals with more than 365 hospital admissions per year. RESULTS: An index for the risk of hospital mortality was obtained. Ranking the hospitals using only the crude mortality rate differed from the ranking when it was adjusted according to the predictive likelihood model. Among the 206 hospitals analyzed, 40 of them presented observed mortality that was signifi cantly greater than what was expected, while 58 hospitals presented mortality that was signifi cantly lower than expected. Use of an intensive care unit presented the greatest weight in making up the risk index, followed by age and diagnosis. When the hospitals attended patients with widely differing profi les, the risk adjustment did not result in a defi nitive indication regarding which provider was best. Among this group of hospitals, those of large size presented greater numbers of deaths than would be expected from the characteristics of the hospital admissions. CONCLUSIONS: The hospital mortality risk index was shown to be an appropriate predictor for calculating the expected death rate, and it can be applied to evaluate hospital performance. It is recommended that, in comparing hospitals, the adjustment using the predictive likelihood model for the risk should be used, with stratifi cation according to hospital size. DESCRIPTORS: Hospital Mortality. Hospital Information Systems, utilization. Logistic Models. Outcome Assessment (Health Care). CrossSectional Studies.


INTRODUCTION
Medical assistance services have been focusing on performance evaluations for healthcare systems in order to improve them satisfactorily, in view of the limitations on fi nancial resources. 25spital mortality is a traditional indicator for hospital performance and, under conditions in which death is not a rare event, it is a useful tool for indicating services that may present quality problems.Differences in mortality rates that are found between hospitals may occur as a function of the severity profi le of the population attended.Thus, this indicator needs to be controlled and adjusted in relation to variables that might affect the result.The residual differences form the indicator for the quality of care. 13variety of severity classifi cation systems have been proposed.The Charlson Comorbidity Index (CCI) 4 and the Diagnosis Related Group (DRG) 19 classifi cation use data from secondary diagnoses to attribute the risk of death to patients, and they can be applied to administrative databases.The American Society of Anesthesiology (ASA) 21 system is used to classify surgical patients according to severity, from the preoperative risk. 22The Acute Physiologic And Chronic Health Evaluation (APACHE), 14 APACHE II and APACHE III systems measure the severity of the clinical condition of patients admitted to intensive care units (ICUs).
Administrative databases are increasingly used in hospital performance evaluations. 26In Brazil, the hospital information system (HIS) of the Brazilian National Health System (SUS) has been shown to be a suitable option, since it presents large quantities of data over periods close to the hospitalization period 9 and because its data can be trusted for hospital performance evaluations. 6udies that have used logistic regression for evaluating the risk of death, from the HIS-SUS database, have analyzed specifi c diagnoses such as acute myocardial infarct, 7 coronary surgery, 5,19 proximal fractures of the femur 22 and infectious diarrhea, 3 or have evaluated sentinel events 9 or specifi c age groups, such as elderly people. 1 Other investigators have used the CCI, 17 which attributes weights to 17 clinical conditions that are present in secondary diagnoses.
The aim of the present study was to develop a predictive model for hospital death based on the data available from HIS-SUS.

METHODS
The authorizations for hospital admission (AHAs) to all SUS hospitals relating to the state of Rio Grande do Sul, Southern Brazil, in the year 2005 were analyzed.These were obtained from a computerized database that is processed nationally by DATASUS, which is available in the public domain on the Internet.
The variable "diagnosis" was created in accordance with the tenth version of the International Classifi cation of Diseases (ICD-10).The database was formed by 739,964 AHAs.Of these, 25,057 psychiatric AHAs, 121,372 AHAs relating to obstetrics, pregnancy, delivery and the puerperium, and 1,338 AHAs relating to patients under long-term care were excluded, because these categories had low mortality rates.In addition, 710 AHAs relating to phthisiology were excluded because of the low number of admissions relating to this specialty, and 137,972 AHAs relating to individuals under the age of 18 years, because the risk of death at this age is lower than among adults for physiological reasons.
The fi nal database consisted of 453,515 type I AHAs, relating to clinical and surgical specialties.This database was then randomly divided into a development sample (two thirds of the total) and a validation sample (one third of the total).Modeling was performed on the fi rst sample.The observation unit was the admission, and the data were aggregated at the hospital level, in order to compare the establishments.
The logistic regression model for hospital mortality tested all the variables available in the AHA database for the year 2005.To make up the variable "diagnosis", the chapters of the ICD-10 that presented the greatest numbers of deaths that would be expected (chapters I, II, VI, IX, X and XVIII) were kept in independent categories, while the remainder were grouped as the reference category.
Variables presenting p values < 0.25 were included in the univariable logistic regression analysis.The modeling followed the strategy recommended by Hosmer & Lemeshow 11 (2000).Variables were removed after comparisons of the likelihood ratios (-2logL) of models with and without the variable in question.Variables were kept in the model as a function of theoretical justifi cations and statistical signifi cance.
The fi t of the fi nal model was evaluated using the Hosmer & Lemeshow (H&L) test. 11Sensitivity analyses on the H&L test performed by Kramer & Zimmerman 15 (2007) in relation to the fi t of predictive models for hospital mortality showed that when n = 50,000, the H&L test erroneously rejected the null hypothesis of the test in 100% of the models in their simulation study.For samples of n = 5,000, the rate of incorrect rejection from the H&L test was only one fi fth of the models.Since the size of the smallest sample in the present study (the validation sample) was approximately 145,000 AHAs, it was decided to test the fi t using random samples of 5,000 AHAs, in conformity with the smallest sample size used by Kramer & Zimmerman 15 (2007).According to Ory & Mokhtariam 20 (2010), subsamples should be formed in large databases in order to test the robustness of the specifi cations of the model and to quantify the bias in large samples using the chi-square statistical test.
The fi nal model was evaluated according to its sensitivity, specifi city and accuracy, and on the basis of percentage improvement of the model in relation to the initial deviance (likelihood ratio).
For the logistic regression model, the area under the ROC curve and the accuracy were obtained.The model that was validated enabled development of the risk index (RI), as suggested by Le Gall et al 16 (1993).In this, the coeffi cients (β) of each variable are multiplied by 10 and rounded to the closest whole number.The purpose of the RI was to facilitate the subsequent use of the model that had been generated, and this was also calculated for the 453,515 AHAs.The area under the ROC curve and accuracy were also obtained for the validation sample.
The likelihood of hospital death per admission was obtained using the logistic regression model.The expected number of deaths (E) was obtained from the sum of the likelihoods of the occurrence of death for each hospital.
In the second stage, a database on the 332 hospitals was constructed, containing the characteristics of the hospital, the deaths observed (O) and the deaths expected (E), and the ratio O/E was calculated.The results relating to the O/E ratio made it possible to compare the observed deaths with the estimate for expected deaths from the predictive model, using the hospital admission characteristics.This was, therefore, an indicator of the institution's performance.
To evaluate the hospitals' performance, the confi dence intervals for the ratio between observed and expected deaths were calculated in accordance with the methodology proposed by Hosmer & Lemeshow 10 (1995).Only the hospitals with a statistically signifi cant difference between observed and expected deaths were classifi ed.Only the hospitals with a minimum number of 365 admissions during the year 2005 were kept in the fi nal evaluation.This resulted in 208 hospitals with a total of 428,701 AHAs.Thus, the selection and calculation of confi dence intervals was done only for hospitals with a minimum of one admission per day, on average.The 95% confi dence interval (95% CI) could be calculated from the expression , where EXP = exponential function; LN = natural logarithm of the ratio O/E; O 2 = square of the number of deaths observed; V2 = variance of the binomial distribution [V2 = prob(1-prob)]; prob = likelihood of death.In this manner, the upper and lower limits for the O/E ratio were obtained.If the 95% CI were to present the value of 1, the O/E ratio would be considered non-signifi cant, i.e. there would be no statistically signifi cant difference between the observed and expected deaths.
From the O/E ratio, the adjusted ranking of the hospitals could be obtained.This adjusted ranking was compared with the crude ranking, defi ned as the mortality rate for each hospital.Cases in which the O/E ratio was greater than 1 signifi ed that the observed mortality in the hospital was greater than the adjusted mortality expected according to the model, taking the number of admissions into consideration.Thus, the higher the O/E ratio was, the worse the hospital's performance was.
Comparison between the ranking of the hospitals according to the crude mortality rate and the ranking according to the O/E ratio made it possible to see the change in the establishment's position caused by adjustment for the admission characteristics.
The analyses were performed for homogeneous groups of hospitals, stratifi ed according to size, using the ranking from the crude mortality rate within each stratum.Hospital size was defi ned according to the number of beds, and was classifi ed as small (up to 49 beds), medium (50 to 149 beds) or large (150 or more beds).

RESULTS
The mortality rate calculated for the 332 hospitals was 6.3%.Table 1 describes the main characteristics of the admissions studied, and presents the variables that were tested in the model.Table 2 presents the results for the fi nal model and the scores from the indicators for constructing the risk index.
The variables of sex and diseases of the circulatory system were kept in the model for theoretical reasons.Given that the outcome of interest was hospital mortality, it was important to control for these variables because they have an important role in proportional mortality.These variables improved the sensitivity and discrimination of the model.Use of an ICU was the variable with the greatest weight, followed by age of 60 years or over.The information on comorbidities was not included in the analysis because of the low rate of fi lling out the fi eld for secondary diagnosis in the database (12.1%).The variables in the adjusted model were categorized as 0 = no and 1 = yes.From the coeffi cient of each variable, the following equation was constructed to calculate the risk index: RI = 2 (male sex) + 6 (age 40 to 59 years) + 14 (age 60 years or over) + 13 (chapter I, infectious/parasitic) + 8 (chapter II, neoplasia) + 10 (chapter VI, nervous system) + 1 (chapter IX, circulatory system) + 6 (chapter X, respiratory system) + 12 (chapter VIII, abnormal signs/ symptoms) + 9 (emergency) + 21 (use of ICU: one to two days) + 17 (use of ICU: three to seven days) + 23 (use of ICU: eight or more days).
The fi nal predictive model was shown to be adequate for calculating the likelihood of hospital death.The logistic regression model presented an area under the ROC curve of 0.781 (95% CI: 0.778; 0.784) in the development sample and 0.780 (95% CI: 0.775; 0.785) in the validation sample.The fi nal model was considered to be a good fi t according to the H&L test (p = 0.256) on the random sample of 5,000 AHAs.
Among the 208 hospitals that had 365 or more admissions during the year 2005, two presented numerical problems in calculating the confidence intervals because no deaths were observed.
Among the 206 hospitals for which the 95% CI could be calculated, 40 showed that the observed performance was worse than the expected value.On the other hand, 58 hospitals were shown to have better performance after adjustment according to the model.Table 3 presents the ranking according to the crude mortality rate and the ranking according to the adjustment criteria, for the large-sized hospitals with statistically signifi cant ratios.The large-sized hospitals were the ones that, together, presented the greatest numbers of deaths in relation to what would be expected from the admission characteristics.
From analysis on the set of hospitals in Table 3, it can be seen that the rankings changed when the mortality rates were adjusted.

DISCUSSION
In the present study, application and validation of a predictive model for hospital deaths using variables available in the HIS-SUS database made it possible to predict occurrences adequately.The admission characteristics may indirectly indicate the severity of patients' conditions.This model can be used to predict hospital deaths and is in accordance with other proposals in the literature that also used logistic regression models to predict hospital deaths according to patients' profi les. 1,7,17 Tfi nal model presented good predictive performance, with an area under the ROC curve of 0.781, and it is proposed that this should be a general index.Other studies have been conducted in relation to specifi c diagnoses, using variables from the HIS-SUS database to predict hospital deaths.These studies included patient attributes and found areas under the ROC curve of 0.750, 1 0.683 17 and 0.586. 18Martins et al 17 (2001) ascribed the low discriminatory power of the model to the low rate of provision of information on secondary diagnoses.These fi ndings show the diffi culty of fi tting predictive models for hospital deaths to a database with insuffi cient information on patients' clinical conditions, particularly with regard to comorbidities for which no information is provided.
Among the variables that remained in the model, the most important one was the use of ICUs, given that patients in extremely serious conditions require technology of greater complexity.Other studies have also found associations between hospital mortality and use of ICUs.The variable "diagnosis", referring to the main diagnosis, contributed signifi cantly to the fi nal model, like in other studies. 1,17 ge and the main diagnosis were considered to be the most important predictors for hospital mortality in a study using HIS-SUS. 17Identifying the patient's main diagnosis is considered essential for adjusting the risk. 23The variable of specialty was not kept in the model because of its strong correlation with the variables of diagnosis and use of ICUs.
The variable "emergency", which portrayed the nature of the admission, was shown to be capable of predicting death.This was expected, since patients who are admitted as emergency cases are in a more severe condition at the time of admission than are elective patients.It has been pointed out that this variable may be used as a proxy variable for the severity of patients' clinical conditions, in the absence of detailed clinical variables, and this has been used in databases from HIS-SUS. 17,23In the present study, this variable improved the discrimination of the model and was kept in it, even though a previous study showed that the HIS data in the city of Rio de Janeiro, Southeastern Brazil had low reliability. 24e variable "age" was the second most important predictor of hospital death in our study.As seen in the adjusted model, it is expected that elderly people will present biological conditions of greater fragility than will young individuals.Elderly patients tend to present chronic problems of greater severity, which may increase the mortality rates. 1 Other studies have found associations between age and hospital death. 5,7,17,22 Its observed that males presented a greater chance of death than did females; an association between sex and hospital mortality has already been described in the literature. 1,5,18On the other hand, in an analysis on myocardial revascularization surgery, Noronha et al 5 (2003) found a greater likelihood of death among women.
The length of patients' hospital stay may vary as a function of the severity of their clinical conditions and the quality of the care received. 23This was tested as a variable, but it did not present any signifi cant association with hospital mortality, contrary to other studies. 1,18 dstein & Spiegelhalter 8 (1996) emphasized the need to use confi dence intervals as a way to diminish the uncertainty associated with estimates from one point in time.Calculation of confi dence intervals was shown to be useful for identifying hospitals with statistically signifi cant differences between the numbers of observed and expected deaths.
It was seen that if hospital performance was ranked only on the basis of the crude mortality rate, the results could be different from those produced with adjustments made using predictive models.Comparative analysis using rankings according to crude mortality rates versus rankings based on the adjustment criteria might show the existence of establishments with low crude mortality rates and lower-than-desired performance, in relation to the whole group.Ash et al 2 (2003) showed that when hospitals attended patients with very different profi les, the risk adjustment did not result in any defi nitive indication of which provider was the best.Organizing the hospitals according to their profi les makes it possible to compare establishments with similar service characteristics and indicates differences according to the type of establishment. 2 Considering that the risk adjustments were made for the admission characteristics and that the profi le of the establishments did not enter the regression model, it was decided to group establishments of similar sizes in order to make better comparisons with the new rankings using the adjusted rates.When the hospitals were stratifi ed according to size, it was seen that in large-sized establishments, the numbers of deaths were greater than what was expected from the admission characteristics.On the other hand, small-sized hospitals presented smaller numbers of deaths than would be expected from the patient profi le.
Because of the low frequency of fi lling out the fi eld for secondary diagnoses, the main limitations of the present study were that it was not possible to include the comorbidities that patients might have presented at the time of admission, in the risk adjustment, and it was impossible to use the CCI.When secondary diagnoses are used as a risk adjustment variable, failures in documenting this information directly affect the calculation of expected mortality. 23Another limitation to be taken into consideration is the administrative-fi nancial nature of HIS-SUS.Financial reasons have been seen to potentially interfere with the information provided. 12It was shown to be important to develop a generic risk index from predictive models, in order to measure the risk of hospital death according to the data available.
In conclusion, it was possible to develop a predictive model for hospital death from the data available in HIS-SUS.Analysis on hospital mortality using the risk index for hospital deaths, adjusted in relation to the admission characteristics, is useful for hospital performance evaluation within HIS-SUS.The risk index can be applied directly to the HIS-SUS database to calculate the expected deaths, with the aim of producing a ranking for the adjusted mortality rate.Ranking based on the ratio of observed/expected deaths with signifi cant confi dence intervals may differ in relation to the ranking produced using the crude mortality rate, and this may provide a more faithful indication of the performance of different establishments within the whole group of hospitals of similar size.
Direct use of risk index scores in the database will make it possible to evaluate hospital performance more objectively.Efforts should be made towards better characterization of the risk that patients present during hospitalization, through better provision of information on secondary diagnoses and greater amounts of information, and inclusion of new clinical variables.Furthermore, incorporation of healthcare establishments' profiles into estimates of the likelihood of hospital death may be useful in comparisons between hospitals, taking into consideration characteristics such as teaching activities, juridical nature and size.
1 indicates the best performance and position 25 indicates the worst performance.

Table 1 .
Admission characteristics and deaths obtained from authorizations within the hospital information system.State of Rio Grande do Sul, Southern Brazil, 2005.

Table 2 .
Final model and score for the risk index for hospital mortality.State of Rio Grande do Sul, Southern Brazil 2005.

Table 3 .
Ranking of large-sized hospitals (150 or more beds) with statistically signifi cant ratios, according to crude rate and adjustment criteria.State of Rio Grande do Sul, Southern Brazil, 2005.