Predicting response to treatment and discriminating bipolar and depression symptoms using Hamilton Depression Rating Scale

Objective: The present study aims to compare the diagnostic ability of the HAMD 17 items with shorter versions of 7 and 6 items. Methods: A total of 133 patients from a 6 month clinical trial diagnosed with mood disorders (60.2% with Major Depressive Disorder and 39.8% with bipolar type I disorder) were recruited. Results: The 17 items HAMD scale showed similar results as compared with shorter versions. Furthermore, almost all patients’ diagnosed with Major Depressive Disorder scored more compared to Bipolar Disorder, but the difference was not significant. Conclusion: This study allows that the use of a shorter version of HAMD might be an adequate possibility, and also that depressive symptoms were similar among groups.


INTRODUCTION
Mood disorders rating scales are essential for research as for clinical practice¹.In clinical trials, there is a need of efficacious measures procedures to evaluate drug efficacy compared to placebo or a standard gold drug.In clinical setting, rating scales are required as an effort to study psychometric properties of patients that can help clinician to evaluate the patient in a more objectively and standardized way².Therefore, the need of tools to assess depressive symptoms was needed to establish a universal language of what could be understood by depression.Then, Max Hamilton developed a scale consisting of 17 descriptors initially 3 , which was subsequently validated in different countries 1,[4][5][6][7][8] , among them Brazil which confirm its ability to auxiliary in the diagnosis depressive disorder and severity detection [9][10][11] .
Regarding to psychometric properties, there is a nonconsensus about reliability coefficients 1,12,13 .Bagby et al. 12 found that HAMD-17 is psychometrically and conceptually inconsistent.Using MEDLINE database of 70 selected studies published between 1979 -2003 to evaluate psychometric properties they found adequate internal reliability, but poor interrater and retest reliability; and validity results indicated poor indices on content validity but an adequate result on convergent discriminant validity.These findings suggest that psychometric results are inconclusive to determine HAMD-17 consistency to evaluate depression and that more studies are necessary.In contrast, Trajković et al.¹ performed a metaanalysis (data obtained from a review of 409 articles available in MEDLINE and PsycINFO between the years 1960-2008) to detect the reliability properties of the HAMD-17 indicating good properties in patients with depression as a primary diagnosis and also with comorbidities and additionally, a good internal consistency index were found.
The multidimensional configuration of HAMD is a recurring point of discussion on the literature.Some argues that depression is a result of multiple causes that makes unfair the definition of depression as a one-dimensional condition.This way, a scale with a good coverage is appropriated when assumes that depression is a result of a set of clinical features, however, the scale must also generate one-dimensional subscales for each clinical feature, in order to evaluate clinical results of pharmacological studies; on the other hand, some researchers criticizes that subscale scores could implies in outperformed results, due to the unspecific items to asses depression symptoms, for example, hypochondriasis and anxiety, that affects directly on depression severity estimates and perceived changes during depressive episode [13][14][15][16][17][18] .Therefore, a short version of HAMD to improve psychometric characteristics and consequently to reduce outperformed results has been developed, to focus in core symptoms and exclude items that are related with medication effects or comorbidities.
Comparing unidimensional HAMD-17 and MADRS six items core symptoms subscales, Maier et al. 19 found that internal consistency and sensitivity to change were similar and recommended the use of smaller versions for clinical practice.Faries et al. 20 studying the utility of unidimensional core symptoms of HAMD versus full HAMD concluded that inadequate forms to measure effectiveness can lead to the use of larger samples and to increase the time of recruitment, financial costs, complexity in the execution of the study and less validity.A study comparing the 17 and 6 item version of HAMD indicated that the six-item scale has a strongly relationship with 17 items in baseline and at endpoint of 143 MDD patients (double and melancholic depression) in four antidepressant drug treatment trial.Results lead authors concluded that the six-item scale is apparently sensitive to changes over time as the 17 items scale 21 .Bech et al. 15 found that a reduced six-item HAMD scale, results on a unidimensional measure of severity of depressive, states episodes and is more sensitive to change than the full scale.
Isacsson and Adler 22 reanalyzes data from six randomized clinical trials to investigate whether HAMD six-item subscale might explain findings of low/absent efficacy of antidepressants drugs in 597 patients with MDD with HAMD score less than eight points.Comparing the HAMD 17 and the 6 item scale, they conclude that the six-item has better perform to explain a large variance and that the HAMD 17 item provides unreliable data, such as, low effect sizes and sensivity to changes.They suggest that HAMD-17 could be inappropriate to assess depression severity.
Helmreich et al. 23 investigates the predictive ability for treatment outcome of HAMD-17 item and a HAMD core depressive symptoms version, which includes depressed mood, physic anxiety, low self-esteem, feeling of guilt and work and activities; other items as, suicide, agitation, retardation, somatic anxiety, general somatic symptoms and libido were included separately.The study was based on a severe 210 MDD outpatients evaluated at inclusion, baseline and after two weeks of antidepressant treatment and the improvement criteria was a reduction of equal to or more than 20% after baseline evaluation.An analysis of sensibility was performed to indicate if the predictive capacity of Toronto 7 item scale and Evans 6 item scale was similar, nonetheless, all subscales showed good sensitivity (80-96%) and moderated specificity (36-54%).Additionally, the best discriminative item between stable remitter and non remitter was work and activities and depressed mood.The authors concluded that core versions could be considered appropriated to predict outcomes as good as 17 item version.The use of 17 item version was recommended only on baseline and week 2 to predict response or treatment failure in the early phase of treatment, and Toronto and Evans scale in the subsequent weeks.Finally, they suggest the need of more prospective studies comparing HAMD-17 and six-ORIGINAL ARTICLE J Bras Psiquiatr.2017;66(3):125-30 HAMD properties in mood disorders time subscales to investigate its ability to predict treatment outcome.Therefore, studies about psychometric properties and clinical implications of shorter versions of HAMD on trials indicated that those scales might be good options for clinical trials, however, clinical data are insufficient.Our study aims to investigate psychometric properties (validity and reliability) of HAMD-17 compared with shorter versions to identify the better form to detect changes during treatment, and secondarily, to investigate differences between MDD and bipolar depressions.

METHODS
This is a follow-up study reporting data obtained from the LICAVAL protocol 24 , collected during a six month clinical trial conducted in a Psychiatric Institute in Brazil.A total of 133 patients from a Mood Disorders Unity (67.7%, n = 90 were females; mean age = 34.50 ± 12.45; and single 36.1% n = 48) were included.Patients were previously diagnosed with Bipolar Disorder Type I -BD I (n = 53) and Major Depressive Disorder -MDD (n = 80) according to DSM IV TR 25 criteria and Structured Clinical Interview -SCID I 26 , applied by trained psychiatrists.Exclusion criteria were: neurological disorders, previous head trauma, any illness requiring medical intervention, had undergone electroconvulsive therapy in the preceding six months, suicidality, comorbidities such as anxiety disorders or substance abuse and psychotic symptoms.
Assessment: the Hamilton Depression Rating Scale -HAMD were used to evaluate depressive symptomatology assessed by trained clinicians prior to treatment (V0) and at weeks four (V4) and eight (V8) after treatment.In order to compare psychometric properties, the items of HAMD 17 were transformed into the following subscales: Bech melancholia scale 14,15 , Gibbons global depression severity 13 and Toronto Scale 27 .Additionally, patients were assessed with the Montgomery-Åsberg Depression Rating Scale 18 (MADRS), to serve as a criteria for compare response.
Procedures: patients were recruited from a tertiary outpatient Mood Disorders Unit.Bipolar and MDD patients were diagnosed according to the DSM-IV TR 25 and included in the study after they had read, understood, and signed the Informed Consent Form, according to the LIthium and CArbamazepine compared to lithium and VALproic acid in the treatment of young bipolar patients (LICAVAL) protocol study 24 .Patients were assessed with the scales which took an average of 30 minutes to apply at entering phase (V0), two weeks (V2) and four weeks after treatment (V4).
Analysis: statistical analyses were conducted with SPSS 20.0.Reliability coefficients were calculated using Cronbach's Alpha indices, Spearman-Brown prophecy 28 for sensibility and specificity; validity was assessed by relationship with internal variables (correlation between items and by diagnosis) and predictive validity (regression analysis).The significance level was set at 0.05 and all tests were two tailed.

RESULTS
Internal consistency reliability coefficients (α) were calculated considering the four different versions at V0, V2 and V4 stages (data are presented in Table 1).The more reliable stage was at V2 and the least was at V0, and MADRS showed the best internal consistency coefficient at V0, HAMD-17 at V2 and MADRS again at V4.The spearmanbrown prophecy did not increase the reliability coefficients.Figure 1 shows area under the curve (AUC) for all HAMD versions and MADRS and Table 2 their respective values.
Visually, Figure 1 shows that MADRS draw the best AUC in the three stages, which is confirmed in Table 2.In general, the HAMD version with better AUC was 7-itens, but always less than MADRS; for V0 the 17-items showed the worst performance.Table 2 shows the statistic results for HAMD versions and MADRS.
The results indicated that the sensitivity and specificity for distinguishing MDD and BD were low in both versions of HAMD and in MADRS scale in all applications, even thought, the MADRS showed better results.Furthermore, there was a tendency to sensitivity exceed specificity.
Correlations between the versions of the HAMD with MADRS were high, with few occurrences presenting magnitude lower than 0.80.It should also be considered that the correlation magnitudes seem to show a tendency to increase during the applications.In addition, the logistic regression was calculated, in order to determine logistican Table 3 the t test and Cohen's d are presented, verifying mean differences between groups.
Bipolar group showed expressive differences (d ≥ .20) in 7-items and 6 a-items Hamilton versions and in Montgomery, in almost all cases, the MDD means were higher than Bipolar.The differences in the stages were similar, but the difference between MADRS at V0 and V4 was statistically significant to discriminate bipolar and Major Depressive Disorders.In bold the test with better sensibility and specificity.V0: entering on treatment; V2: after two weeks of treatment; V4: after four weeks of treatment.

DISCUSSION
Considering the importance of HAMD scale to assess depression, this study aims to compare the diagnostic capacity of HAMD versions compared with MADRS.The psychometric properties of HAMD scale has been questioned, and considered as an imprecise scale.This way, tentatives to refine this scale was proposed, using versions based on core symptoms 8,12,13 .
In the present study, HAMD 17 scores were reliable and satisfactory, and this could be comparable to Trajković et al. 1 study, that related good properties in depression patients (primary diagnosis or with comorbidities) and with Bagby et al. 12 revision, that found a Cronbach's Alpha (α > 0.70) and the internal reliability Referring to short versions, satisfactory; however, the 7 item version was the most adequate.It is also important mentioning that MADRS as HAMD showed results that confirm their reliability and the idea that this scale are designed to be particularly sensitive to change in patients with antidepressant medication 18 , although, HAMD was the best option at V0 and MADRS during treatment (V2 and V4).This result might indicate that, during treatment, when core symptoms are more easily to be detected, MADRS is more indicated to detect change differences.
Applying the area under the curve (AUC) for HAMD versions and MADRS to analyze the sensitivity of both scales  to predict cases, results indicated that full version showed the worst performance, whereas HAMD 7 and MADRS, the best.Sensitivity reflects how much the scale is effective to identify correctly individuals who don't have depression (false-negative results), this way, the MADRS and HAMD 7 version are the most indicated rating scales to evaluate depressive individuals on this sample.Nonetheless, reliability results for short versions indicated that that scales has good sensitivity and specifity scores 23 and HAMD was not sensitive to specify depression symptoms as expected 12,13,17 .Addictionally, in contrast to the other subscales the McIntyre et al. 8 subscale perform the best reliability and sensibility scores, and was the best version to predict results as HAMD 17, and showed the best correlation indices.So far, MADRS was the unique scale able to differentiate depressive symptoms between groups, showing that at entering time (V0) and after one month after medication (V4) the scale was able to detect and differentiate groups, indicating that MDD group has more symptoms than BD.

CONCLUSION
In conclusion, some limitations should be punctuated.First, only cases with MDD moderate to severe was considered, while BD patients could present results were considered and included with mild depression.This can contribute for interpreting depressive symptoms among groups.Second point is that there are limitations about the sample size.Considering that was applied classical methods of analyzes that could be inflated by the "n", we try to reduce this bias with other methods, as r prophecy and Cohen D.
Despite this limitations, these findings indicates that core versions could be considered appropriated to assess depressive symptoms among ambulatory patients as good options considering time to administrate scales and for clinical trials, principally HAMD 7 version item.Concerning to 17 item version, results indicated a satisfactory internal reliability, but a unsatisfactory sensibility to discriminate false cases.Making an allowance for MADRS, it is important to consider that this scale was the one that presented results that are more reliable.

INDIVIDUAL CONTRIBUTIONS
Adriana Munhoz Carneiro, Lucas de Francisco Carvalho -Contributed to conception and design, analysis and interpretation of data.
Ricardo Alberto Moreno -Had substantially contributed to drafting the article or revising it critically for important intellectual content.
André Cavalcanti -Had given the final approval of the version to be published.

Table 1 .
Internal consistency reliability coefficients The correction by spearman-brown prophecy is presented inside the parentheses; HAMD: Hamilton Depression Scale; MADRS: Montgomery-Asberg Depression Rating Scale.V0: entering on treatment; V2: after two weeks of treatment; V4: after four weeks of treatment.