External Validation of a Brazilian Predictive Nomogram for Pathologic Outcomes Following Radical Prostatectomy in Tertiary Teaching Institutions : the USP Nomograms

ARTICLE INFO ______________________________________________________________ ______________________ Purposes: (a) To externally validate the Crippa and colleagues’ nomograms combining PSA, percentage of positive biopsy cores (PPBC) and biopsy Gleason score to predict organ-confined disease (OCD) in a contemporary sample of patients treated at a tertiary teaching institution. (b) To adjust such variables, resulting in predictive nomograms for OCD and seminal vesicle invasion (SVI): the USP nomograms. Materials and Methods: The accuracy of Crippa and colleagues’ nomograms for OCD prediction was examined in 1002 men submitted to radical prostatectomy between 2005 and 2010 at the University of São Paulo (USP). ROC-derived area under the curve (AUC) and Brier scores were used to assess the discriminant properties of nomograms for OCD. Nomograms performance was explored graphically with LOESS smoothing plots. Furthermore, univariate analysis and logistic regression models targeted OCD and SVI. Variables consisted of PSA, PPBC, biopsy Gleason score and clinical stage. The resulted predictive nomograms for OCD and SVI were internally validated with bootstrapping and the same abovementioned procedures. Results: Crippa and colleagues’ nomograms for OCD showed ROC AUC = 0.68 (CI: 0.650.70), Brier score = 0.17 and overestimation in LOESS plots. USP nomograms for OCD and SVI showed ROC AUC of 0.73 (CI: 0.70-0.76) and 0.77 (CI: 0.73-0.79), respectively, and Brier scores of 0.16 and 0.08, respectively. The LOESS plots showed excellent calibration for OCD and underestimation for SVI. Conclusions: Crippa and colleagues’ nomograms showed moderate discrimination and considerable OCD overestimation. USP nomograms showed good discrimination for OCD and SVI, as well as excellent calibration for OCD and SVI underestimation.


INTRODUCTION
Prostate cancer (PCa) is the second most prevalent malignancy among Brazil's male population.Its estimated incidence was 53.84 per 100,000 men in 2010 (1).The pathologic stage of PCa is critical for the success of treatment.Extra-prostatic extension and seminal vesicle invasion influence treatment choices, cure rates and decisions regarding preservation of the neurovascular bundles responsible for erectile function (2).In 1993, Partin's pioneer study (3) estimated the risk of extra-capsular ex-tension, seminal vesicle invasion (SVI) and lymph node status based on levels of PSA, clinical stage and Gleason score from prostate biopsy.The number of mathematical models used to predict the pathological stage has increased over the past 10 years.One systematic review identified 16 predictive and 22 prognostic models suitable for clinical use, most of them requiring external validation (4).According to Touijer and Scardino (5), there is a large degree of uncertainty when assessing the prognosis and predicting the outcomes in PCa management.
In 2006, Crippa et al. (6) published the first population-based study in Brazil aimed to predict organ-confined disease (OCD).PSA levels, Gleason score from prostate biopsies and percentages of positive biopsy cores (PPBC) were used as predictor variables.The model was constructed and internally validated on 898 private-practice patients submitted to radical retropubic prostatectomy (RRP) performed by one surgeon.The corresponding surgical specimens were examined by the same pathologist.The resulting nomograms correctly estimated OCD in 91.1% of patients.The main limitations of predictive nomograms included lack of external validation and of periodic updates to accommodate for changes occurring over time in populations, diseases and diagnostic methods [4].Prediction tools become increasingly robust as they are successively validated in distinct environments because the variability improves the accuracy (calibration and discrimination) and the generalizability of the model (7).As a consequence, adjustment is strongly indicated when applying a prediction model to populations with distinct characteristics or when temporal changes in disease or variable behavior are suspected (8).
The University Hospital of the University of São Paulo Medical School and the Cancer Institute of the State of São Paulo are public reference centers for PCa in Brazil.As tertiary centers, their population of patients is quite heterogeneous, as most patients have their biopsies performed at their original institutions, while surgical procedures and pathological examinations are performed by supervised residents at distinct levels of training.We hypothesized that such heterogeneity could (a) significantly challenge the generalizability and transportability of a nomogram constructed on a more homogenous population and (b) require adjustments of the predictive no-mogram to accommodate the characteristics of the population assisted at public tertiary centers.
The objectives of this study were (a) to perform the external validation of Crippa and colleagues' nomograms and (b) to develop an adjusted nomogram for prediction of organ-confined disease and seminal vesicle invasion based on the population assisted at the abovementioned public tertiary institutions (USP nomograms).

MATERIALS AND METHODS
This study was approved by the Institutional Review Board.The patients' informed consent was waived.Electronic medical records of 1,094 consecutive prostate cancer patients who underwent RRP by the Walsh technique (9) as modified by Srougi (10) between January 2005 and December 2010 were retrospectively reviewed.All surgeries were performed by a urology resident assisted by an experienced urologist assistant.The following data were extracted: (a) clinical staging based on the 2002 TNM classification (11).The T class was based on rectal examinations performed by urology residents and confirmed by a faculty urologist; (b) preoperative PSA levels, which were updated within the institution if measured more than 90 days from the preoperative consultation; (c) prostate biopsy findings, including total number of specimens obtained, number of positive fragments, and Gleason scores (12) stratified on primary and secondary components, and total scores; (d) TNM pathologic staging (11) and Gleason histological classification based on electronic reports of standardized pathological examinations of surgical specimens consisting of prostate, seminal vesicles and, eventually, the removed lymph nodes.Organ-confined disease was defined as the absence of tumoral cells in periprostatic adipose tissue and/or in neurovascular bundles.Seminal vesicle invasion was characterized by the infiltration of tumoral cells not limited to the adventitia.

Statistical analysis
The sample size was based on a 34% reported prevalence of non-OCD (6) and four predictor variables, with 10 and 25 events per variable, which required 118 and 294 subjects, respectively (13).
Accordingly, the available 1002 subject sample was considered suitable for the study.
This study was based on the premise that the current sample would differ in significant aspects from that of Crippa and colleagues' original study (6).To test this hypothesis, demographic data, including the clinical stage, PSA values, and pathological findings presented in Table-1 of the original study were compared to data from the sample of the current study by two-sided unpaired t-tests and z--tests for proportions, as appropriate.The outcomes of interest were organ-confined disease and seminal vesicle invasion.

External validation of Crippa and colleagues' nomograms
Crippa and colleagues' nomograms were designed to predict OCD.For external validation, the probability of OCD was estimated for each patient in the validating sample as the average probability predicted by nomograms 1 and 2 of the original study (6), based on the respective ranges of PSA levels (0-4, 4.1-10, 10.1-20, and above 20ng.mL−1 ), of Gleason scores (2-6, 7 and 8-10 in nomogram 1, and 2-6 and 7-10 in nomogram 2) and PPBC (0-25%, 25.1-50%, 50.1-75% and 75.1-100%).Receiver operating characteristic (ROC) curves and the respective areas under the curves (AUC) accessed the discriminatory capability of the nomograms.Brier scores estimated the predictive performance of the nomograms based on mean squared deviations between predicted and observed outcomes and varied from 0 (perfect) to 0.25, which indicates that the model lacked any predictive capability.The extent of nomogram overor underestimation was explored graphically within LOESS calibration plots (14).Coincidence of curves best fitted to scatterplots of predicted and observed outcomes with the diagonal lines on the plots indicates good model calibration along the ranges of prediction.

Construction of the USP nomograms
The USP nomograms aimed to predict both OCD and SVI based on ranges of PSA levels, of clinical stages, of Gleason scores and of PPBC.
Chi-squared tests were used to assess the association between predictor variables and binary outcomes (OCD and SVI).Significantly associated variables were entered into stepwise logistic regression analyses to identify independent predictors of the respective outcomes.Final coefficients and odds ratios and the respective 95% confidence intervals were obtained from 1000 bootstrap resampling procedures (15).Hosmer and Lemeshow tests were used to assure the adequacy of the models.ROC curves were constructed.Areas under the curves, positive and negative predictive values of each model assessed discriminatory capabilities.Brier scores and LO-ESS plots were also constructed, as described above.
Statistical analyses were performed on Stata v.10 (StataCorp LP, College Station).The significance level (alpha) was set at 0.05.

RESULTS
Of the total of 1,094 patients, seventy-seven incomplete records, thirteen records of patients who received neoadjuvant hormonal therapy and two records of patients diagnosed following endoscopic resection of the prostate were excluded, resulting in 1,002 patients.

External validation of nomograms from Crippa et al.
Table -1 shows the demographic, clinical and pathological data of patients in this study sample compared with the data from Crippa and colleagues' study.Significant differences were observed with respect to age, clinical stage, pathological stage, Gleason score (7 and 8-10 categories), number of total and positive cores and PPBC.ROC curves of predictions based on Crippa and colleagues' nomograms on the observed outcomes of patients in the validation sample.AUC and the respective 95% confidence limits for the predictions based on nomograms 1 and 2 were 0.68 (0.65-0.70) and 0.68 (0.65-0.71).Both nomograms had Brier scores of 0.17 (Figure -1).
LOESS plots for predictions based on both nomograms.Considerable overestimation of OCD in all ranges of prediction is suggested (Figure -2).

USP nomograms
Table-2 shows the results of the chi-squared tests used in univariate analyses.No significant as-   sociations with either OCD or SVI were found for the total number of fragments in prostate biopsies or patient's age.PSA levels, Gleason scores, PPBC and clinical stage categories were significantly associated with both outcomes.Table -3 summarizes the final logistic models, with bootstrap odds ratio 95% confidence limits.The final categories of PSA levels and Gleason scores that best fitted the model resulted from the collapsed ranges.Clinical stage was rejected from the final logistic models (Figure -3).
The LOESS plots in Figure-4 depict the calibration of USP nomograms.Visual inspection revealed that the OCD curve was mostly coincident with the diagonal line, suggesting good calibration in all segments except for the 15% through 30% range, where underestimation occurred.LOESS plots for SVI predictions suggest underestimation of SVI in all ranges of prediction above 2%.Hosmer and Lemeshow: χ 2 = 3.9; df = 6; p = 0.67

DISCUSSION
This study has shown moderate discriminative power and considerable OCD overestimation bias of Crippa and colleagues' nomograms.Conversely, USP nomogram exhibited good discriminative power and calibration for prediction of OCD.In contrast, USP nomogram for prediction of SVI considerably underestimated that outcome.Superoptimistic behavior of predictive models in external validation processes is a common and widely acknowledged phenomenon (16).The wide and persistent use of Partin tables is based more on its clinical usefulness than on its statistical performance (17).Similarly, in spite of the modest predictive accuracy and informative performance of Crippa and colleagues' nomograms, by incorporating relevant independent predictors of OCD, they may contribute valuable prognostic information.In doing so, and considering their limitations, they may be used as a better alternative to clinical staging.The moderate predictive performance of such nomograms may have resulted from sample biases.Such a finding supports the original hypothesis of this study, according to which the heterogeneity of the tertiary teaching center population might disclose eventual weaknesses of nomograms developed in a more homogeneous population.The next logical step in the study was to create new nomograms based on the same variables but originating from and validated in the population of our teaching institutions, including a prediction model for SVI.
The USP predictive models exhibited consistency, as confidence intervals of the original sample coincided with those obtained by bootstrap, and adequate predictive performance, as assessed by the tests of Hosmer and Lemeshow.Areas under the ROC curves greater than 0.7 and overall percentage of correct classification equal to 73% (OCD) and 77% (SVI) suggest a moderate to high discriminatory ability of both models.Similarly, calibration of the OCD predictive model was robust.In contrast, LOESS diagrams for the SVI predictive model showed underestimation of the outcome in all ranges of prediction over 2%.
It has been suggested that proper calibration of a nomogram is more clinically useful than is its  discriminatory capability (7).Accordingly, USP nomograms for predicting OCD can be clinically useful.Conversely, given its poor calibration, the SVI nomogram demands extensive external validations and variable adjustments to improve its accuracy.Partin tables have demonstrated good discriminating capability (AUC = 0.74) (18).A recent validation in the Surveillance, Epidemiology and End Results (SEER) dataset showed appropriate discrimination of the Partin tables, but the study did not report on their calibrations (18).In contrast, a European validation study failed to confirm their accuracy (17).Reasons for these conflicting results may include the fact that at the time of its construction, in 1993, only 39% of patients had a non--palpable tumor at diagnosis (19).An increasing prevalence of T1c tumors at diagnosis has occurred over recent decades (20).In Crippa and colleagues' study, clinical stage T1c was present in 48.1% of patients diagnosed between 1988 and 2002, while in our sample, 62.7% of patients were T1c.These findings justify repeated adjustment and revalidation procedures to accommodate disease and population changes over time.
The superiority of PPBC associated with PSA levels and Gleason scores over the clinical stage in predicting extra-prostatic disease has been demonstrated (6,21).This study confirmed these findings.
The use of only a few variables is desirable in nomograms to increase utility in busy practices (16).Clinically useful nomograms should be applicable to individual patients and provide this information as percentages of outcome likelihood (22).The nomograms in this study fulfilled the abovementioned requirements.
Seminal vesicle preservation during RRP may improve erectile function and urinary continence (23).Seminal vesicle involvement demands a wider radiation field during radiotherapy (24) and is associated with higher rates of biochemical recurrence and worse prognosis (25).Such practical issues stress the importance of accurate prediction of organ-confined disease.
The growing number of low-risk PCa patients managed by active surveillance continues to generate controversy about the concepts of indolent disease, the criteria for treatment and the impact on patient survival compared to treated patients (26).Nomograms for predicting indolent disease (27) are in use, but require extensive external validation.
This study included 419 (41.1%) low-risk patients who were treated surgically.Of these, 86.4% had OCD.The USP nomogram predicted 86.1% of OCD in low-risk cases.Active surveillance requires periodic measurements of PSA and repeated prostate biopsies (28).The availability of such data allows sequential recalculations of OCD likelihood in USP nomograms.
The retrospective nature of this study and the impossibility of reviewing prostate biopsies derived from several centers may have biased our data.Furthermore, clinical staging did not include imaging examinations.Inter-examiner bias may have caused eventual misclassifications of clinical stage and of pathological examinations of surgical specimens (29).
The abovementioned bias-inducing factors are inherent to retrospectively collected data from referral teaching centers and were acknowledged during planning.This study shares these features with other major validation studies.
Predictive values of further models may increase by the inclusion of additional variables as angiolymphatic or perineural invasion and novel cellular, molecular, and genetic biomarkers (30).

CONCLUSIONS
USP nomograms showed good discrimination for OCD and SVI, as well as excellent calibration for OCD and SVI underestimation.

CONFLICT OF INTEREST
None declared.

Figure 2 -
Figure 2 -LOESS plots for predictions based on both nomograms.Considerable overestimation of OCD in all ranges of prediction is suggested.

Figure 3 -
Figure 3 -ROC curves for prediction of OCD and SVI.

Figure 4 -
Figure 4 -LOESS curves probability distribution for OCD and SVI.