COMPARATIVE EVALUATION OF THE PSYCHOMETRIC PROPERTIES OF ORTHOPEDIC SCALES FOR LOW BACK PAIN

Objective: To evaluate the reliability, response capacity and validity of four scales for low back pain and correlate these scales with each other and the Self-Administered Comorbidity Questionnaire (SACQ). Methods: We evaluated the psychometric properties of four previously selected scales for low back pain: the Roland-Morris Disability Questionnaire (RMDQ), the Quebec Back Pain Questionnaire (QBPDS), the Waddell Disability Index (WDI) and the Back Pain Functional Scale (BPFS) and Self-Administered Comorbidity Questionnaire (SACQ) comorbidity scale. Exploratory and confirmatory factor analyses were conducted. Reliability and internal consistency were measured by Cronbach’s alpha. Validity was measured through correlation of the scales with the Self-Administered Comorbidity Questionnaire comorbidity scale and an analysis of the structural equations between them. Results: The scales showed adequate indicators based on the factor structure and showed Kaiser-Meyer-Olkin values above 0.90. After the exploratory factor analysis, all scales showed fit indicators suited to a factor model, following the same pattern as the original validations. Similarly, they showed good internal consistency (Cronbach’s alpha greater than .78). The only scale that showed factor loadings suggesting the exclusion of any item was the Roland-Morris. In terms of validity, the scales showed positive correlation coefficients similar to the Self-Administered Comorbidity Questionnaire and between them. Conclusion: Regarding the scales evaluated, they showed similar indications of reliability and internal consistency, such that we did not find sufficient evidence to indicate one scale over another. Level of Evidence I; Diagnostic studies – Investigation of a diagnostic test.


INTRODUCTION
Low back pain is currently an international health issue, It is estimated that 70% of the population in developed countries will have this symptom at some point in their lives. [1][2][3] Among the most frequent causes of medical care, low back pain is second only to upper respiratory tract disorders. 4,5 It is the most common cause of work disability in the United States in people under 45 years of age. 6,7 According to US statistics, the individual cost per person with low back pain is $8000 and the total annual cost of this condition ranges from 38 to 50 billion dollars. 8 Self-reporting low back pain questionnaires, such as the Roland-Morris Disability Questionnaire (RMDQ), 9 the Quebec Back Pain Disability Scale (QBPDS), 10 the Waddell Disability Index (WDI) 11 and the Back Pain Functional Scale (BPFS) 12 are used routinely in medical clinics and clinical studies, as they allow patient conditions to be evaluated before and after a given treatment, monitoring the course of the disease, in addition to allowing a comparison of results in multicenter studies. [13][14][15] The psychometric properties of these low back pain scales have been tested with outpatient subjects and published. However, to the best of our knowledge, this is the first study to compare the psychometric properties of these scales in the general US population presenting symptoms of low back pain using the Amazon Mechanical Turk (MTurk) online platform. MTurk is a website that contains a simplified study design for data collection and extensive participant recruiting, that is, the main requirement for conducting research. 16 The objective of this study is to compare the psychometric properties of four self-administered questionnaires for low back pain via the Mechanical Turk platform and to normalize the results comparing the scales with each other and with the Self-Administered Comorbidity Questionnaire (SACQ).

METHODS
This study was approved by the Institutional Review Board of the Faculdade Uningá (Maringá -PR) as protocol number FR 489024. The informed consent form was presented as the first page of the online questionnaire with a description of the study and its objective. Subsequently there were two alternatives: "Yes, I accept" or "No, I don't accept". The study participants agreed to sign the informed consent form. If the respondent did not agree with the informed consent form, he/she was automatically taken to the end of the survey.
Data collection was conducted through the MTurk website (http:// www.mturk.com), which allows research data to be obtained in a short amount of time at a low cost. It functions through HITs (human intelligence tasks), 17 which are tasks created by requesters to be completed by paid task workers. 16 After completing the questionnaire the worker receives financial compensation, usually less than one US dollar. 18 Studies conducted using MTurk suggest that the participants are internally motivated to participate in the tasks and not by the monetary compensation. The quality of data obtained through MTurk meets or exceeds the psychometric standards associated with published research. 16 alfa de Cronbach y la validez a través de la correlación con la escala de comorbilidades Self-Administered Comorbidity Questionnaire, y a través del análisis de las ecuaciones estructurales entre ellas. Resultados: Las escalas presentaron indicadores adecuados con base en la estructura de factorial y mostraron valores Kaiser-Meyer-Olkin por encima de 0,90. Después del análisis factorial exploratorio, todas las escalas presentaron indicadores de aptitud adecuados para un modelo de factor siguiendo el mismo patrón que las validaciones originales. Del mismo modo, presentaron buena consistencia interna (alfa de Cronbach mayor que 0,78). La única escala que presentó cargas factoriales que sugerían la exclusión de algún ítem fue la Roland-Morris. Con relación a la validez, las escalas presentaron coeficientes de correlación positiva similares a la escala Self-Administered Comorbidity Questionnaire y entre sí. Conclusión: Cuanto a las escalas evaluadas, éstas presentaron indicadores de confiabilidad y consistencia interna semejantes, por lo que no encontramos evidencias suficientes para indicar una escala sobre otra. Nivel de evidencia I; Estudios diagnósticos -Investigación de un examen para diagnóstico.
A total of 395 participants filled out a demographic questionnaire, four questionnaires specific to low back pain, and one addressing comorbidities, which will be described below. We used the Qualtrics (http://www.qualtrics.com) internet survey software to load the questionnaires from MTurk. The participants accessed a link that took them directly to Qualtrics. A filter was programmed so that only participants with low back pain could participate. We also advised that the results would be used in medical research and suggested that the answers be honest. Qualtrics collects the responses and formats the data in an Excel spreadsheet.

Roland-Morris Disability Questionnaire
This scale consists of 24 Yes / No questions related to physical functions in order to specifically evaluate the disability caused by the low back pain and how much it has affected the individual within the past 24 hours. It is a self-administered questionnaire and can be completed in less than 5 minutes. In the scale, 1 point is assigned to each positive response and the final score can range from 0 (without disability) to 24 (severe disability). 9

Quebec Back Pain Disability Scale
This 20-item specific condition scale evaluates the level of disability in daily activities in patients with low back pain. It is a self--administered scale and can be completed in 5 to 10 minutes. Each of the 20 activities of daily living is scored on a scale ranging from 0 points ("without any difficulty") to 5 points ("unable to perform"). The points are summed to obtain the disability score, which ranges from 0 to 100. 10

Waddell Disability Index
This scale is comprised of 9 Yes / No questions that assess activities of daily living commonly restricted by low back pain. The final result is calculated as the sum of the positive items and varies from 0 to 9. The questionnaire is easy to administer and can be completed in about 5 minutes. 11

Back Pain Functional Scale
This 12-item questionnaire evaluates activities of daily living related to low back pain. Each item is scored on a 6-point scale, where 0 indicates disability and 5 without difficulty. The final score ranges from 0, representing the worst functional level, to 60, representing the highest functional level. 12

Self-Administered Comorbidity Questionnaire
This short, easy-to-understand, self-administered questionnaire measures comorbidities. It is highly reproducible and is moderately correlated with the Charlson Index, 19 a standard medical comorbidity index. 20

Statistical analysis
Descriptive analysis and the characteristics of all the subjects were applied through means and percentages with a confidence interval of 95%.

Psychometric properties
Reliability (Internal Consistency) Internal consistency is an interrelationship between the items of a scale. 21 Different items in a questionnaire may ask the same questions slightly differently to obtain the opinion or functional level of the interviewee reliably. We used Cronbach's alpha to determine internal consistency, with coefficients higher than 0.70 indicating good internal consistency.

Validity
Validity refers to the degree to which a tool measures what it is intended to measure. 22 Construct validity refers to how well a tool measures when compared to tools of similar or different purpose or scope. 23 We used construct validity to find a correlation with each result. Correlation between the low back and SACQ scales was made by the correlation adjusted for normal distribution of data.
Factor Analysis (Exploratory Factor Analysis and Confirmatory Factor Analysis) To investigate the internal structure of the scales we used exploratory factor analysis (EFA) in order to reduce the variables to variance factors. The number of factors tested was determined by the eigenvalues (numbers greater than 1.00), plot analysis, commonality, and interpretability factor (model with theoretical foundation). The EFA was conducted with the principal axis factoring method and promax rotation (oblique) and the cutoff of 0.40 was defined for factor loadings.
The models developed by EFA were tested by confirmatory factor analysis (CFA) in the form of a measurement model composed of latent variable models. This procedure evaluated the fit and adequacy model from the aptitude indicators, factor loadings, and individual reliability of each item. Maximum likelihood was the estimation method used due to the multivariate normality. The aptitude model indicators were chi-squared, the root mean square error of approximation (RM-SEA) (values less than 0.05 are considered as an adequate fit), the comparative fit index (CFI) (values greater than 0.95 are accepted as a good fit), the goodness-of-fit index and adjusted goodness-of-fit index (GFI/AGFI) (values greater than 0.90 are interpreted as an acceptable fit), the Tucker-Lewis index (TLI) (acceptable fit with values greater them 0.97), and Akaike information criterion, Bayesian information criterion, and the expected cross validation index (AIC/BIC/ Modified expected cross-validation index (MECVI)) (lower values indicate a better model as compared to the others). 24
All scales presented adequate indicators based on their factor structure and showed Kaiser-Meyer-Olkin (KMO) values above 0.90. The exploratory factor analysis (EFA) confirmed the original factor structure for all the scales, assessing only one latent construct -pain. Only the BPFS had eigenvalues higher than 1, which suggests the possibility of more than one factor to be retained in the EFA for all the other scales. However, the scree plot and KMO confirmed the hypothesis of only one factor for all scales. The factor structure of the BPFS presented the greatest amount of explained product variance, however, it was the only one with commonality values that suggested a problem. The only scale that presented factor loadings that suggested exclusion (commonality above 0.50 or no factor loading less than 0.40) was the RMDQ in item 2, which showed a small loading factor (<0.40). After analyzing the behavior of the items in the EFA, all the scales presented adequate aptitude indicators for a factor model, following the same model as the original validations ( Table 2).
All the scales had good internal consistency (Cronbach's alpha greater than 0.78), indicating high consistency in the response pattern. Considering the validity, all the scales had similar positive moderate correlation coefficients (0.45-0.47), with positive strong correlations between the scores of the scales (Table 3).

DISCUSSION
To our knowledge, this is the first study to compare the psychometric properties of four questionnaires used for self-assessment of low back pain in a generic population through Mechanical Turk and normalizing the results, comparing the scales against each other and with the SACQ. In general, the four scales evaluated demonstrated adequate internal consistency, reliability and validity to evaluate patients with low back pain.
All the scales studied presented indications for the same original factor structure, that is, a single construct. However, the factor analyses conducted in our study showed that all the scales presented indications that there could be more factors. The same was reported in studies of the WDI, in which the analysis of the components did not succeed in demonstrating a single satisfactory construct. According to the author, four measures of motion could be combined, but the 4-factor factorial structure was weak. [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25] The QBPDS scale validated for Hungarian 26 and Greek 27 had an EFA performed and found four and six factors, respectively. We did not find articles in the literature that had conducted an EFA of the RMDQ and BPFS scales. No article cites performing CFA, which we performed for all the scales in our study.
The reliability of the scales was measured by the Cronbach's alpha coefficient, considered adequate for values above 0.7. The results obtained in our study ranged from 0.78 (WDI) to 0.93 (BPFS). A previous study that evaluated the psychometric properties of the RMDQ scale found a similar value (0.96). 28 Similarly, studies that translated and validated the RMDQ scale into other languages obtained values close to ours (0.91). In the Persian translation the value found was 0.83; 29 for the German, 0.81; 30 Spanish, 0.84; 31 Turkish, 0.85; 32 Greek, 0.88; 33 and Japanese, 0.86; 34 thus confirming the homogeneity of the items of this scale. Articles that translated the QBPDS scale to Dutch, 35 Turkish 36 and Arabic 37 found values between 0.92 and 0.95, that is, values close to ours (0.92). As for the WDI scale, it obtained the lowest internal consistency value (0.78) as compared to the others. This finding corroborates a previous study that reported Cronbach's alpha values of between 0.86 and 0.96. 25 The fact that this scale does not take work, physical and personal care activities into account and that the questions are not associated with a certain period of time might explain the lower internal consistency values. And finally, the BPFS scale had a high internal consistency value (0.93), which is in line with a prior study that also reported a value of 0.93. 12 When we consider the psychometric property of validity by comparing the scales investigated with the SACQ comorbidity scale, our results indicate that all the scales show different correlation patterns between low back pain and several comorbidities. Making the correlations between the scales themselves, they ranged from moderate to high. In a previous study to validate the QBPDS scale for Arabic, the QBPDS was correlated with the Oswestry Disability Index (ODI) scale and the Numeric Pain Rating Scale (NPRS), showing high correlation. 37 This same scale translated into Persian was correlated with the RMDQ and presented an excellent correlation, QBPDS = 0.92 and RMDQ = 0.83. 29 In their statistical analysis of their Persian translations of both the RMDQ and QBPDS scales, Mousavi et al. 29 found, as we did, a relationship between low back pain and degrees of disability.
The same authors also found a significant relationship between these scales and the measurement of patient pain.
However, we cannot guarantee that there is a direct relationship to pain that the patient feels and their actual physical disability, a fact that was also observed by Maaroufi et al. 28 and Stratford, Binkley, Riddle, 12 who succeeded in establishing good correlation between the RMDQ scale and pain measurement (r = 0.32, p=0.005). However, they failed to associate the scale with other variables such as pain duration, for example.
As for the BPFS scale, it showed a correlation in the same direction as the disability variable. 12

CONCLUSION
All the scales presented good indicators of reliability and internal consistency, the WDI scale set apart with the worst reliability value, even though it was still acceptable. Therefore, since the reliability was similar, we cannot indicate one scale over another based on the magnitude of absolute internal consistency values.
One of the limitations of this study was the difficulty in finding articles in the literature that explore the factor structure of scales. However, this did not interfere with our results.
A positive aspect was the use of MTurk, which allowed us to group a heterogeneous sample of participants quickly. This reinforces MTurk as an effective means of data collection for further studies.
Further studies using our sample will be conducted towards equalization of the scale values.
All authors declare no potential conflict of interest related to this article.