Feasibility of transitioning from APACHE II to SAPS III as prognostic model in a Brazilian general intensive care unit. A retrospective study

ABSTRACT CONTEXT AND OBJECTIVE: Prognostic models reflect the population characteristics of the countries from which they originate. Predictive models should be customized to fit the general population where they will be used. The aim here was to perform external validation on two predictive models and compare their performance in a mixed population of critically ill patients in Brazil. DESIGN AND SETTING: Retrospective study in a Brazilian general intensive care unit (ICU). METHODS: This was a retrospective review of all patients admitted to a 41-bed mixed ICU from August 2011 to September 2012. Calibration (assessed using the Hosmer-Lemeshow goodness-of-fit test) and discrimination (assessed using area under the curve) of APACHE II and SAPS III were compared. The standardized mortality ratio (SMR) was calculated by dividing the number of observed deaths by the number of expected deaths. RESULTS: A total of 3,333 ICU patients were enrolled. The Hosmer-Lemeshow goodness-of-fit test showed good calibration for all models in relation to hospital mortality. For in-hospital mortality there was a worse fit for APACHE II in clinical patients. Discrimination was better for SAPS III for in-ICU and in-hospital mortality (P = 0.042). The SMRs for the whole population were 0.27 (confidence interval [CI]: 0.23 - 0.33) for APACHE II and 0.28 (CI: 0.22 - 0.36) for SAPS III. CONCLUSIONS: In this group of critically ill patients, SAPS III was a better prognostic score, with higher discrimination and calibration power.


INTRODUCTION
Prognostic models reflect the population characteristics of the countries from which they originate.The development of the Acute Physiology and Chronic Health Evaluation II (APACHE II) system was based on a cohort of patients in the United States, 1 and it has been used in many intensive care units around the word.In contrast, the Simplified Acute Physiology Score III (SAPS III) was validated in a multicenter and multinational cohort study. 2 Predictive models should be customized to fit the case-mix population where they will be used because the outcomes in the original databases and period from which the models were derived may be different from the databases of intensive care units (ICUs) using the models. 3,4It is not clear whether calibration of the established models for local circumstances would enhance their accuracy in stratifying patients. 57][8][9] In South America, SAPS III was calibrated with a level of 1.3 (i.e. the relationship between observed and predicted mortality was 1.3). 2,10[13] Comparison between observed and predicted mortality rates could serve as an indicator of ICU performance, and lead to overall improvement in healthcare services.However, ICU profiles vary worldwide, depending on the proportions of medical and surgical patients, admission and discharge policies, availability of intermediate care units and staffing with intensive care specialists. 13y transition from a well-established approach to a new one requires caution and validation.Changing APACHE II for SAPS III has some advantages and the most important is the fact that SAPS III is the only prognostic score that included a cohort of patients from South America in its development.

OBJECTIVE
In the present study, we aimed to perform external validation on two predictive models and directly compare their performance in an independent population of mixed critically ill patients.

Data collection
This study was approved by the Ethics Committee of Hospital Israelita Albert Einstein and, because of the retrospective nature of the study, the informed consent requirement was waived.
The data were collected from all patients admitted to a mixed 41-bed ICU in the tertiary-level private hospital in Brazil from August 2011 to September 2012.
Data were retrospectively collected using APACHE II only between August 2011 and December 2011, and using SAPS III only between May 2012 and September 2012.From January 2012 to April 2012, during a period of calibration, both scores were calculated for all patients admitted to the ICU and were collected for analysis.The data collection practices were standardized and performed by a trained nurse or physician.All data were checked for implausible and outlying values.The data included age, gender and type of admission (clinical, elective surgery or emergency surgery).

Study population
All ICU admissions were enrolled during the period analyzed.
The exclusion criteria were: age < 18 years, missing data and not receiving ICU care.The admissions between January 2012 and April 2012 were used as a validation database to study the performance of APACHE II versus SAPS III for all admissions and in subgroups according to the type of admission.

Scores and predicted mortalities
The calculations of the individual scores for each model were based on the most disordered physiological values recorded during the first 24 hours of ICU admission for APACHE II and were based on the variables measured one hour before and after ICU admission for SAPS III.The mortality probabilities for APACHE II and SAPS III were calculated using the original equations. 1,2

Performance of the scores
The calibration of the scores was tested using the Hosmer-Lemeshow goodness-of-fit procedure, which was calculated by dividing the admissions into ten deciles according to the risk of death.The chi-square statistics were determined for each decile and summing the chi-square values for the ten deciles resulted in the test value. 14A high P value would indicate a good fit for the model.Hosmer-Lemeshow is a test for assessing agreement between the actual and predicted death rates.The discriminative ability of the models was assessed using receiver operating characteristic (ROC) curves and the respective areas under curves (AUC). 15The AUC is an expression of the model' s ability to discriminate correctly between survivors and non-survivors.
The standardized mortality ratio (SMR) was calculated using the models by dividing the number of observed deaths by the number of expected deaths.Confidence intervals for the SMR were calculated to test the model's uniformity-of-fit, using the methods that have been put forward. 16The variables were compared between the three periods using analysis of variance (ANOVA).Calibration curves were constructed by plotting the predicted death rates stratified as 5% intervals of mortality risk (x-axis) versus observed death rates (y-axis).Finally, we constructed a model using Cox regression analysis with APACHE II and SAPS III as independent factors.Statistical significance was defined as P < 0.05 and the results are presented as mean ± standard deviation unless indicated otherwise.All the statistical procedures were performed using the SPSS 20.0 statistical package (SPSS, Chicago, Illinois, USA).

Study population
A total of 3,333 ICU admissions were enrolled until the end of September 2012.The formation of the database is presented in Table 1.The characteristics of the population in the three periods are presented in Table 2.The ICU and hospital mortality and the APACHE II score decreased over time, and the SAPS III score increased during the periods.

Calibration and discrimination
The Hosmer-Lemeshow goodness-of-fit statistics supported model fit for all in-ICU mortality models with the exception of APACHE II for patients in the calibration database undergoing elective surgery.For inhospital mortality, there was worse fit for APACHE II among clinical patients during the first period and for SAPS III among patients in the calibration database undergoing elective surgery (Table 3).The calibration curves for APACHE II and SAPS III showed overestimation of the risk of death in all ranges of predicted mortality (Figure 1).Discrimination, as tested by the AUC, among general and clinical patients, was better for SAPS III in relation to in-ICU and in-hospital mortality (P = 0.042) (Table 4).Figure 2 shows the ROC for SAPS III and APACHE II, for in-ICU mortality in the calibration database in different situations.

Standardized mortality ratio
The SMRs for the whole population were 0.27 (CI: 0.23 -0.33) for APACHE II and 0.28 (CI: 0.22 -0.36) for SAPS III.In the calibration database, the SMRs for APACHE II and SAPS III were 0.33 (CI: 0.22 -0.50), and 0.36 (CI: 0.25 -0.55), respectively.For all models, the SMRs showed some variation across the spectrum of patients.The SMRs ranged from 0.24 to 0.46 for APACHE II, and from 0.09 to 0.31 for SAPS III.In the calibration database, the SMRs ranged from 0.13 to 0.38 for APACHE II, and from 0.18 to 0.40 for SAPS III (Table 5).

Cox regression model
The hazard ratios for in-hospital and in-ICU mortality using APACHE II as an independent factor were 1.08 (95% CI: 1.04 -1.12)    and 1.09 (95% CI: 1.04 -1.14), respectively.For SAPS III, the hazard ratios for in-hospital and in-ICU mortality were 1.03 (95% CI:

DISCUSSION
The external validation of these two widely used prognostic models showed good discrimination and good calibration when applied to the same independent population of Brazilian ICU patients.
The transition from APACHE II to SAPS III in this Brazilian ICU was feasible and, in some scenarios, SAPS III had even better performance than APACHE II.
The SAPS III score was developed using data from cancer patients.Costa e Silva et al. 19 showed, in Brazilian critically ill patients with acute kidney injury, that SAPS III presented good discrimination and calibration performances, accurately predicting mortality in this group of patients.Finally, Nassar et al. 13 demonstrated that SAPS III had good discrimination and inadequate calibration in a general cohort of Brazilian patients.1][22] These scores have also already been used with other aims like estimation of prolonged mechanical ventilation in surgical patients. 235][26] Also, one limitation of this study is that our hospital has a lower mortality rate than other institutions from Brazil and this makes it difficult to differentiate whether we have a score with bad performance or whether we have an ICU with excellent performance.

CONCLUSIONS
We showed, in a Brazilian cohort of ill patients, that SAPS III was a better prognostic score, with higher discrimination and calibration power.The transition from APACHE II to SAPS III was feasible in this scenario.

Figure 1 .
Figure 1.Calibration curve for APACHE II (black line and bar) and SAPS III (gray line and bar).The bars represent the number of patients in each risk group.The dashed diagonal line indicates ideal prediction (predicted = observed mortality).

Table 1 .
Study database APACHE II = acute physiology and chronic health disease classification system II; SAPS III = simplified acute physiology score III.

Table 2 .
Characteristics of the study population ICU = intensive care unit; APACHE II = acute physiology and chronic health disease classification system II; SAPS III = simplified acute physiology score III.

Table 3 .
Model calibration assessed by means of Hosmer-Lemeshow goodness-of-fit statistics