ASSESSMENT OF SCORES IN DECISION MAKING IN METASTASES OF THE SPINE

Objective: The aim of this study is to assess the intraand interobserver concordance of SINS, Harrington, Tokuhashi and Tomita scores among general orthopedic surgeons and spine surgeons with experience above 5 and 10 years in the evaluation of patients with spinal metastasis. Methods: Twenty cases of patients with metastatic lesion of the spine were presented to 10 examiners and the scores aforementioned have been applied. After six weeks, the cases were reintroduced in a different order and data were analyzed. Results: The intraobserver reliability showed better agreement in SINS score among examiners with less experience and Harrington and Tomita scores among those who had more than 10-year experience. The interobserver reliability of the examiners of the group with over 10-year experience showed higher precision when using these scores, especially Harrington and Tomita. The SINS score was the choice for daily practice and was able to modify the management more often. Conclusions: This study demonstrated that the use of predictive scores of instability, Harrington, and prognosis, Tomita, had a higher intraand interobserver reliability particularly among spine surgeons with experience above 10 years.


INTRODUCTION
The spine is the most common site of metastatic disease. 1,2atients with cancer present spinal metastases in 70% of cases and up to 10% develop spinal cord compression. 2,3The ratio of these lesions to primary tumors is 40:1 and skeletal dissemination should be considered in the differential diagnosis of a patient with a spinal cord lesion. 1,4,5etastatic involvement of the spine is more common in primary tumors of the breast, lung, kidney, thyroid gland, and the prostate, in that order, according to Papastefanou et al. 6 In our practice, Valesin Filho et al. 7 described the prevalence in 55 patients: breast (32.7%), multiple myeloma (25.4%), prostate (14.5%), gastric carcinoma (5.4%), neoplasia of the lung (3.6%), neoplasia of the kidney (3.6%), and others (14.5%).
The involvement of the spine is a common problem and its incidence is increasing because the methods of detection, screening, and treatment of primary cancer are allowing patients with active disease to live longer. 6,80][11][12] The main classification systems take into account instability (the Spine Instability Neoplastic Score [SINS] 8 and Harrington 13 ) and the prognosis (Tokuhashi et al. 10 and Tomita et al. 14 ), and help reduce uncertainty in the decision-making process. 6his study therefore attempts to evaluate the intra-and interobserver concordances in the SINS, 8 Harrington, 13 Tokuhashi et al., 10 and Tomita et al. 14 scores between general orthopedic surgeons and spine surgeons with more than 5 and 10 years of experience in the evaluation of patients with vertebral metastasis.

MATERIAL AND METHODS
We retrospectively evaluated the medical records of 20 patients with metastatic lesions in the spine treated at the Spine Clinic of Hospital São Paulo, of the UNIFESP Department of Orthopedics, Escola Paulista de Medicina.The study was submitted for the approval of the Research Ethics Committee of this institution, under number 959680.The patients' identities were not revealed to the evaluating physicians.
The study included patients with metastatic disease of the spine, verified in the case histories, clinical exams, and imaging exams.Five patients with multiple myeloma were also included, as we believe that these patients have natural histories, behaviors, and treatments similar to those of the patients with metastases.Insufficient information was adopted as the criterion for exclusion from the study.
The cases were presented to four general orthopedists, four spine surgeons with more than 5 years of experience, and two spine surgeons with more than 10 years of experience, and displayed in Power Point with a list of symptoms, laboratory and imaging (radiography, tomography, and resonance) exams, and biopsy results.The examiners were then given the SINS 8 (Tables 1 and 2), Harrington 13 (Table 3), Tokuhashi 10 (Tables 4 and 5), and Tomita 14 (Tables 6 and 7) scores for their assessment of the cases.After six weeks the cases were reviewed again in a different order, to eliminate any memory bias, with a restatement of the scores presented in the tables below and a self-assessment questionnaire consisting of three questions: (1) Do you use these scores routinely in treating patients with vertebral tumors?(2) If you do not use them, what is your method of choice for daily practice?and (3) If the treatment indicated by the score is different from yours, which classification system would change your conduct?The responses were stored, for the calculation of intra-and interobserver reproducibility.

RESULTS
A convenience sampling was taken, consisting of 20 patients with an average age of 54.5 years, mostly women (70%).The diagnosis was

Yes 3
No (occasional pain, but not mechanical) 1 Pain-free lesion 0

None of the above 0
Posterolateral involvement of the spinal elements (facet joints, pedicle or costovertebral joint fracture or replacement with tumor)   Intraobserver reliability (Stability over time and repeatability) 15 Figure 1 shows, according to the Bland and Altman method, 16 the average difference between each pair of observations made by the ten examiners (y axis) and how much these differences deviate from zero (ideal) and from the line of average difference (real bias) for all observations.In the Tomita and Tokuhashi tools, only one pair of sets of observations showed a difference beyond the limits of confidence of the bias line.
One can see that the real bias line in all three tools approximates zero, denoting a small average difference between the two application times of the tools.The distance between the bias line and zero was tested under the hypothesis of being different, with a negative result accepting the null hypothesis.
No significant correlations were observed between the bias and the magnitude of the measurement, i.e. there is no association between the average distance of the pairs of observations and the average score obtained.
To summarize, we can say that the tools analyzed using the Bland and Altman 16 method are stable during the time interval between the two measurements and there is an acceptable level of concordance between the scores obtained by the same examiner on two separate occasions.
Table 8 shows the degree of concordance between the stability/ prognosis categories of the 10 examiners, stratified according the level of expertise and hands-on experience using the tools.
The SINS tool yielded one pair of observations with perfect intraobserver concordance, another pair with substantial concordance, and three pairs with moderate concordance, while the other pairs did not demonstrate reasonable concordance between the two time periods.The group of general orthopedic surgeons presented the highest degrees of concordance, with a Kappa average of 0.44 and half of the observations with substantial concordance or better.
In the evaluation using the Tomita tool, the group with more than five years of experience had a substantially higher frequency of concordant observations, though the highest Kappa average (0.62) was observed in the group with more than 10 years of experience.
In general, the Tokuhashi tool presented low intraobserver concordance with 60% of the observations at a reasonable degree of concordance or worse.The Kappa average was 0.34 and was higher among the examiners with more than 10 years of experience.
The Harrington tool had a higher proportion of observers with almost perfect or substantial concordance (30%), while the other 70% of the observations obtained reasonable or moderate concordance.The Kappa averages for the groups by expertise/experience were 0.45, 0.52, and 0.59 for general orthopedic surgeons, spine surgeons with more than 5 years of experience, and spine surgeons with more than ten years of experience, respectively.
Interobserver Reliability (Reproducibility) 15 Table 9 gives an overview of reproducibility among the scores obtained using the tools of the study.For the SINS tool, Intraclass Correlation Coefficients closer to 1 were observed among the examiners with experience suggesting better concordance between the diagnoses.This finding was not observed in the analysis of the ICC of the interobserver scores for the Tomita and Tokuhashi tools.The examiners in the group with more than 10 years of experience had lower average coefficients of variation (CV), especially for the Tomita tool, denoting greater precision in its use.

ASSESSMENT OF SCORES IN DECISION MAKING IN METASTASES OF THE SPINE
For the analysis of concordance of stability/prognosis among the professional groups, Table 10 demonstrates greater precision among the examiners for the Harrington tool.There was better intra-examiner concordance (k=0.45) in the group of general orthopedic surgeons using SINS.The group with more than 5 years of experience performed better than the other groups using the Tokuhashi tool.The group of examiners with more than 10 years of experience seemed to have better precision using the Tomita and Harrington tools.
Regarding the self-assessment questions, we observed that the general orthopedists did not use the scores to treat cases (100%) and they were more inclined to use the SINS tool.The examiners with experience used the scores routinely, most often using the Tokuhashi and SINS.The latter is the more frequently used, and is liable to modify the conduct in relation to a case (83%).

DISCUSSION
Our study was developed to evaluate the use of scores in patients with vertebral metastases.The treatment decision in these patients may be modified through the use of classifications that are capable of evaluating instability (SINS 8 and Harrington 13 ) and prognosis (Tokuhashi et al. 10 and Tomita et al. 14 ).The development of simple classification systems with easy radiographical attributes and patient factors that contribute to and facilitate communication and appropriate referrals between oncologists, radiologists, orthopedic and spine surgeons, and neurosurgeons help to ensure that treatment plans are faster and better optimized.
Establishing criteria for a surgical indication is very difficult due to the variety of symptoms and survival prognoses. 10,17Early detection and proper intervention are critical to minimizing the sequelae from spine metastases, reestablishing function, and maximizing the quality of life. 18here was almost perfect inter-and intraobserver reliability in the total SINS scores for the three clinically relevant evaluations of tumor-related instability, which can be described as stable (score from 0 to 6), indeterminate (imminent from 7 to 12), and unstable (13 to 18). 8,17,19We observed that all the examiners rated it as important in daily practice and capable of changing their conduct in relation to a case.One criticism of the SINS score was that the neurological status of the patient, a potential modifier of the treatment approach, is not included in the evaluation.Harrington designed a classification scheme with five categories for metastatic tumors of the spine based on bone destruction and neurological impairment. 13In this score, surgery is indicated only in the presence of vertebral instability or mechanical pain.The Harrington classification had the best interobserver precision among the spine specialists, almost perfect among those with more than 10 years of experience (K 0.81).However, this system is excessively oversimplified, resulting in broad categories of patients who can have very different prognoses. 2For example, a patient with radicular pain but good function can be allocated to the same group as a patient with complete paralysis from a large tumor. 2 In 1990 Tokuhashi et al 10 elaborated a treatment and procedure selection strategy based on life expectancy.In 2005 they revised the tool to improve the precision of this system.The score interval for the parameter "primary site" was changed to 0 to 5 points and the total score possible was increased to a maximum of 15 points.Pre-treatment prognostic evaluation is the most important factor in determining the selection of treatment methods, including surgical procedures.Using this system, life expectancy was consistent with real post-treatment survival time in 86.4% of cases in a prospective series of 118 patients and in 82.5% of cases in a retrospective series of all 246 cases. 10Our study showed low intraobserver concordance with 60% of the observations with a reasonable degree of concordance or worse and a Kappa average of 0.34.Despite these conclusions, in the self-assessment questionnaires this score was included in the preferences of the spine specialists as one of the scores of choice, and as a score capable of changing the treatment approach in 80% of the cases.This fact may be attributed to the presence of a large number of variables and the inclusion of neurological status.
Tomita et al. 14 recommend an alternative prognosis scoring system, taking the histology of the tumor and its biological behavior into account.This system was built based on the retrospective data of 67 patients between 1987 and 1991, and weighted point values were given to prognostic factors following an evaluation of their statistical rates of risk.The histology of the primary tumor is closely correlated with survival in both surgical patients and medical cohorts, with greater survival times observed in patients with breast, prostate, and thyroid cancers.Therefore, the type of primary tumor played a predominant role in the scoring of the Tomita apud Choi et al. 2 system.This score had the lowest coefficient of variation (7%) among the examiners with more than 10 years of experience, demonstrating better interobserver concordance, though it was neither the tool of choice for daily practice nor was it selected as a modifier of approach.
There is controversy in the literature concerning the use of the Tokuhashi and Tomita scores in patients with myeloma. 20,21Leithner et al. 20 proposed the inclusion of multiple myeloma among the malignancies with longer survival times, despite its being a hematological disease rather than a metastatic disease spread from a solid tumor, and suggested that these patients be included in a group with a better prognosis.Thus, in the Tokuhashi score, the primary site would receive 5 points, and in the Tomita score, a classification as a slow-growing primary site would receive 1 point.The study by Majeed et al. 21also included patients with myeloma in the Tokuhashi and Tomita scores, given that they are allocated to the same category as patients with metastasis of the prostate and breast and because a similar survival was observed in the study by these authors.In our study we observed great variability in our attempt to include myeloma in these scores.In general, the patients in poorer clinical condition were allocated to the groups with poorer prognoses.In the Tokuhashi score, patients were classified by primary site in the other and unidentified categories, while in the Tomita classification, most cases were placed in the rapid growth group.This put these patients in categories with less aggressive or conservative treatments, thereby modifying this form of conduct in the cases.In our practice, Avanzi et al. 22 studied the correlation between spinal fractures and survival in patients with multiple myeloma using the Tokuhashi and Tomita scores.These authors were not able to predict survival using these scores.The inclusion of myeloma in well-defined categories may help to better identify these patients.

CONCLUSION
This study demonstrated that the use of scores that predict instability, that of Harrington in particular, and prognosis, mainly Tomita, have a higher level of intra-and interobserver reliability among spine surgeons with more than 10 years of experience.The SINS score was the instrument of choice for daily practice, and the one that most often led to a change of conduct.
All authors declare no potential conflict of interest concerning this article.

Figure 1 .
Figure 1.Bland and Altman 16 method for analysis of the repeatability of the stability/prognosis scores between two different observations.São Paulo, 2014.

Table 1 .
ASSESSMENT OF SCORES IN DECISION MAKING IN METASTASES OF THE SPINE SINS score.

Table 2 .
Interpretation of SINS.

Table 5 .
Strategy for the treatment of metastases according to the Tokuhashi score.

Table 7 .
Strategy for the Tomita score.

Table 9 .
Reproducibility according to the intraclass correlation coefficient (ICC) and the average coefficient of variation (CV), calculated for the scores obtained.

Table 10 .
Interobserver concordance according to the Kappa method.

Table 8 .
Intraobserver concordance according to the Kappa method.