Acessibilidade / Reportar erro

Differential Item Functioning in the Beck Depression Inventory

Abstracts

INTRODUCTION:

There are several studies showing the presence of Differential Item Functioning (DIF) in some items of the Beck Depression Inventory (BDI), when comparing men and women. The presence of a large number of items with DIF in BDI is a severe threat to the validity of measurement of the intensity of depressive symptoms obtained by Item Response Theory (IRT) and to the conclusions based on the scores derived from the items with or without DIF.

OBJECTIVE:

The objectives of this study were to identify these items from the BDI, adjust the IRT model for embarrassing items (model 2), which accommodates items with the presence of DIF, and compare these results with the fit of the traditional two-parameter logistic IRT model (model 1).

METHODS:

The results obtained with the both models were compared.

RESULTS:

Items with DIF were: sadness, feeling of failure, dissatisfaction, guilty, punishment, crying, fatigability and loss of libido. The results of the adjustment of the two models are similar in discrimination, gravity (except for items with DIF), and in the calculation of scores for individuals. Nevertheless, model 2 is beneficial because it shows the differences in gravity of depressive symptoms for groups evaluated, thus providing more information to the researcher on the study population.

CONCLUSION:

This model, which has a broader scope in terms of target population, may be a good alternative to the identification and follow-up of individuals with potential depression.

Item Response Theory; Differential Item Functioning; Intensity of Depressive Symptoms; Beck Depression Inventory; Latent trait; IRT Model for embarrassing items


INTRODUÇÃO:

Diversos estudos mostram o Funcionamento Diferencial do Item (DIF) em itens do Inventário de Depressão Beck (BDI), ao compararem homens e mulheres. A presença de um grande número de itens com DIF no BDI é uma severa ameaça à validade da medida da intensidade de sintomas depressivos obtida pela Teoria da Resposta ao Item (TRI) e às conclusões baseadas nos escores derivados dos itens com e sem DIF.

OBJETIVO:

Os objetivos deste estudo foram identificar esses itens do BDI, ajustar o modelo de TRI para itens constrangedores (modelo 2), o qual acomoda itens com a presença de DIF, e comparar esses resultados com os do ajuste do modelo logístico de dois parâmetros tradicional da TRI (modelo 1).

MÉTODOS:

Os resultados obtidos com ambos os modelos foram comparados.

RESULTADOS:

Os itens que apresentaram DIF foram: tristeza, sentimento de fracasso, insatisfações, culpa, punição, choro, fatigabilidade e perda da libido. Os resultados do ajuste dos dois modelos são similares quanto à discriminação, gravidade (à exceção dos itens com DIF) e no cálculo de escores para os indivíduos. Apesar disso, o modelo 2 é vantajoso, pois mostra as diferenças em gravidade do sintoma depressivo para os grupos avaliados, trazendo, dessa forma, mais informação ao pesquisador sobre a população estudada.

CONCLUSÃO:

Esse modelo, que tem um alcance mais amplo em termos de população-alvo, pode ser uma ótima alternativa na identificação e acompanhamento de indivíduos com potencial depressivo.

Teoria da Resposta ao Item; Funcionamento Diferencial do Item; Intensidade de Sintomas Depressivos; Inventário de Depressão Beck; Traço latente; Modelo TRI para itens constrangedores


INTRODUCTION

A latent trait is a variable that may be observed directly. In an attempt to measure it, it is necessary the use an instrument consisted of items which, presumably, reflect it. Establishing measure equivalence between groups which differ as to their characteristics such as school education, gender and race, for example, is important in the evaluation of mental health, so that these groups may be compared in terms of their measures of interesting traits, such as intensity of depressive symptoms, physical functioning or satisfaction with care, for instance11. Teresi JA, Fleishman JA. Differential item functioning and health assessment. Qual Life Res 2007; 16(Suppl 1): 33-42.. Therefore, before comparing groups of respondents (according to their age or gender, for example) in terms of latent trait being measured, one must be confident that the items comprising the measure operate equivalently between the different groups11. Teresi JA, Fleishman JA. Differential item functioning and health assessment. Qual Life Res 2007; 16(Suppl 1): 33-42.. In other words, there is a possibility that some items, specially psychological and/or psychiatric measures, work differently or have biases according to the different respondent groups22. Embretson SE, Reise SP. Item Response Theory for Psychologists. New Jersey: Lawrence Erlbaum Associates; 2000.. If an item has a different response function for both groups, this item then is said to be biased33. Lord F. Applications of item response theory to practical testing problems. Hillsdale: Routledge; 1980..

In the literature on Item Response Theory (IRT), the term bias has been essentially replaced by the expression Differential Item Functioning (DIF). The DIF occurs when the probability of a determined response to an item of the instrument does not relate to the latent trait in two or more respondent groups, i.e., when the probability of choosing as a response a category of an item does not depend only on the latent trait of the individual, but also on the fact that they belong to a given group (for example, the probability of choosing a response category is different between men and women with the same latent trait). More specifically, the DIF occurs when an item represents a different Item Characteristic Curve (ICC) for each group or, equivalently, when any parameter of the item differs between the groups. If there is a bias-free item, the answers to this item will be related only to the level of the latent trait that the item is trying to measure. If the item has a bias, then the answers to it will be related to some other factor besides the latent trait.

Many measuring instruments, especially in psychiatry, have items which may work in different ways within the different groups. Among those, the Beck Depression Inventory may be mentioned. It is an instrument which estimates the latent trait of the Intensity of Depressive Symptoms. Some studies report the presence of items with DIF in the BDI concerning gender44. Hammen CL, Padesky CA. Sex differences in the expression of depressive responses on the Beck Depression Inventory. J Abnorm Psychol 1977; 86(6): 609-14.

5. Santor D, Ramsay J, Zuroff D. Nonparametric item analyses of the Beck Depression Inventory: evaluating gender item bias and response option weights. Psychol Assess 1994; 6: 255-70.
-66. Salokangas RK, Vaahtera K, Pacriev S, Sohlman B, Lehtinen V. Gender differences in depressive symptoms. An artefact caused by measurement instruments? J Affect Disord 2002; 68(2-3): 215-20.. The difference between the responses' distribution of men and women were observed in the items regarding crying, punishment, loss of libido, dissatisfaction, guilt and fatigability.

The presence of a great number of items with DIF in the BDI is a severe threat to the validity of the measure for intensity of depressive symptoms obtained by the IRT and to the conclusions based on the scores resulted from the items with and without DIF. A possible solution for this problem would be the elimination of those from the measuring instruments. However, this could compromise the measure of the latent trait, because for the items have information considered relevant, since the BDI was built in order to encompass all observable depressive symptoms77. Beck AT, Steer RA. Beck Depression Inventory. Manual. San Antonio, TX: Psychological Corporation; 1993.. The use of a model which allows the maintenance of all items in the instrument and, at the same time, contemplates the differences between the groups is actually a great alternative for the analysis of BDI data.

The IRT model for embarrassing items, proposed by Cúri et al.88. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70., is within this perspective, since it preserves such characteristics. Thus, this study aimed at identifying BDI items which have a DIF for gender, i.e., which have biases comparing men and women through the differential analysis of the item, adjusting the model for embarrassing items for the sample considered and comparing these results with the ones from the adjustment of the traditional two parameter logistic IRT model.

METHODS

SAMPLE

The individuals come from a cross-sectional study conducted in order to perform the adaption, normatization and validation of the Beck Scales into Portuguese, in a study conducted by Dr. Jurema Alcides Cunha and published in 200199. Cunha JA. Manual da versão em português das Escalas Beck. São Paulo: Casa do Psicólogo; 2001..

The BDI scale, originally with 4 points, for the objectives of this work, was dichotomized in a way the response takes over the value 1 (Xij = 1) when the individual j reports having the symptom described in item i (i.e., chooses one of the categories with scores 1, 2 or 3 of the determined item) and 0 (Xij = 0) in case it does not represent that symptom.

CONSIDERED MODELS

Two IRT models were adopted for dichotomous variables (in this case being the absence or presence of the depressive symptom).

Unidimensional logistic model of 2 parameters (Model 1)

This is a IRT model for the dichotomic response, appropriated for the measures in which the item does not equally discriminate the levels of the latent trait22. Embretson SE, Reise SP. Item Response Theory for Psychologists. New Jersey: Lawrence Erlbaum Associates; 2000.,1010. Andrade DF, Tavares HR, Valle RC. Teoria da Resposta ao Item: conceitos e aplicações. In: Anais do 14º SINAPE; 2000 jul 28; Caxambu (MG).. The two parameters model predicts that the probability that the individual j presents the symptoms measured in item i, conditioned to its intensity on depressive symptoms, i.e., P(Xij = 1 | θ j, ζj), as follows:

where: i = 1, ..., 21 items, j = 1, ..., n individuals, ζj = (ai, bi)t, θ j is the intensity of the depressive symptoms (latent trait) of the individual j (parameter of the individual); bi is the parameter of gravity (position) of the item i and it represents the gravity of the depressive symptom described by the item i (when θ j = bi , the probability of presence of the symptom i is 0.5); ai is the discrimination (or inclination) parameter of the item i.

IRT model for embarrassing items (Model 2)

This model for dichotomous items, proposed by Cúri et al.88. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70., allow to differentiate the severity of the presence of depressive symptoms among individuals who are embarrassed and not embarrassed by a specific item so they have different behaviors face their respective ICC. The probability that individual j has or not the symptom measured in item i (Xij = 1 or 0, respectively) and feel embarrassed or not by the item i (Cij = 1 or 0, respectively) is:

where:

is the intensity of the depressive symptoms (latent trait) of the individual j (individual's parameter); b1i is the severity parameter of item i for individuals who are not embarrassed, named, from now on, as the group with standard behaviors (women); b2i is the severity parameter of item i for embarrassed individuals, named, from now on, as the group with different behavior (men); ai is the discrimination (or inclination) parameter of item i; γi is the probability that the individual in the different behavior group states having the depressive symptom, i.e., the probability of an embarrassed individual saying they actually have the given symptom (notice that, in the not embarrassed group, it is assumed that this possibility is 1); δi is the probability of an individual presenting different behavior in relation to the symptom i. In this study, it will be assumed that the classification of embarrassed and not embarrassed individuals will be given according to gender, meaning, Cij = 1, for men, or 0, for women.

This model, in addition to the discrimination parameter of the item, common to the other IRT models, estimates other parameters which regard the different functioning of those items presenting DIF. For those items, the groups are comparable among each other, but you cannot do this when looking at the severity parameters. The parameters b1i and b2i express different probabilities of an individual presenting the symptom. The proper comparison between severities should be done between b1i and θ *0,5.

Notice that b1i, as in the 2-parameter logistic model, may be interpreted as the intensity of the depressive symptoms of an individual with standard behavior, such that the possibility of having the symptom i is 0.5 (when θ j = bi, Pij = 0.5). On the other hand, for individuals with different behavior, when θ j = b2i, Pij* = γi/2. For this reason, comparing b1i and b2i does not make sense. In this work, the interpretation of the severity of the assessed symptoms by an item with DIF will be made through the comparison of intensities of the depressive symptoms of the individuals in each of the 2 groups, to whom the probability of having the symptoms is 0.5. In the group with standard behavior, it is θ j = bi and, in the group with different behavior, θ j = -(1/ai) in[(γi - 0.5) / 0.5] + b2i (whose estimative is θ *0,5).

ANALYSIS STRATEGY

The analysis of the differential functioning of the item was performed with the use of the technique known as Item Response Theory Log-Likelihood Ratio (IRTLR), version 2.0b1111. Teresi JA, Ocepek-Welikson K, Kleinman M, Cook KF, Crane PK, Gibbons LE, et al. Evaluating measurement equivalence using the item response theory log-likehood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measure of physical functioning ability and general distress. Qual Life Res 2007; 16(Suppl 1): 43-68. using the IRTLRDIF software, developed by Dave Thissen and available in his homepage1212. Thissen D. Dave Thissen's Front Page. Disponível em www.unc.edu/~dthissen/dl.html. (Acessado em 26 de julho de 2008).
www.unc.edu/~dthissen/dl.html...
. This procedure comes from the definition of Frederic Lord on DIF (then called the item's bias) and uses the log-likelihood ratio test as a significance test for the null hypothesis that the parameters of a response function of an item does not differ between groups - a significant result indicates the detection of DIF. As for IRT parameter models, the parameter group of the item is isomorphic (it has the same shape) to the response function of the item. The IRTLRDIF software has implemented two of the most used IRT models: the 3-parameter logistic model and the graded response polytomous model of Samejima1313. Samejima F. Estimation of latent ability using a response pattern of graded scores. Madison (WI): Psychometric Society; 1969.. The 2-parameter logistic model (used in order to identify DIF items) is a special case of both previous models and, in this software, it is implemented as a gradual response model with two response categories. Because of the sample size, the significance level used for the identification of the DIF items was 1%.

The adjust of the 2-parameter IRT logistic model (model 1) and some embarrassing items (model 2) was performed through elaborate routines in WinBUGS, version 1.4.31414. Lunn DJ, Thomas A, Best N, Spiegelhalter D. Winbugs - a Bayesian modeling framework: concepts, structure, and extensibility. Stat Comput 2000; 10: 325-37.. The routines regarding both models used a Bayesian method of parameter estimation through Markov Chain Monte Carlo simulation (MCMC).

This study was submitted and approved by the Research Ethics Committee of the Universidade Federal do Rio Grande do Sul (UFRGS), meeting No. 37, minute No. 117, October 30, 2008.

RESULTS

The demographic characteristics of the sample may be found in the article Teoria da resposta ao item aplicada ao Inventário de Depressão Beck1515. Castro SMJ, Trentini C, Riboldi J. Teoria da resposta ao item aplicada ao Inventário de Depressão Beck. Rev Bras Epidemiol 2010; 13(3): 487-501., wherethe Samejima's graded response model was adjusted to those data. It is noteworthy that the individuals in the samples are divided almost equally between men and women, with a slight advantage for the later ones.

The items presenting DIF, according to the log-likelihood ratio technique, were: sadness, dissatisfaction, guilt, punishment, crying, fatigability and loss of libido. The results of the adjusted model 1 are in Table 1 and the results of the adjusted model 2, considering the 8 items presenting DIF and the male group as the individuals embarrassed by those items, in Table 2.

Table 1.
Mean and standard deviation of the posterior distribution of parameters in the 2-parameter logistic model (model 1).
Table 2.
Mean and standard deviation of the posterior distribution of parameters of the model for embarrassing items (model 2).

The estimative of the discrimination parameters in models 1 and 2 (Tables 1 and 2, respectively) indicate that basically all items may be considered appropriate regarding this characteristic (ai > 188. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70.,1010. Andrade DF, Tavares HR, Valle RC. Teoria da Resposta ao Item: conceitos e aplicações. In: Anais do 14º SINAPE; 2000 jul 28; Caxambu (MG).), except weight loss and self reproaching. The items with higher discrimination power are related to feeling of failure and dissatisfaction.

From the severity estimatives (bi) of the depressive symptoms (Table 1), it is observed that symptoms of self reproaching and irritability are less severe and symptoms such as weight loss and suicidal ideas, the most severe ones. It is noteworthy that weight loss is the most severe depressive symptom and, at the same time, the one that less discriminate the population (â19=1.20). However, the suicidal ideas symptom is the second most severe one (b^9=0.93) and it discriminates well the population for the severity level of the depressive symptoms (â9=1.71).

As for the severity of the symptoms, the results are the same of model 1 for all BDI items which do not present DIF. The difference occur in the eight remaining items. It is noticeable that guilt is more likely to be observed in higher levels of the depressive symptoms intensity (b^1,5=0.58) among women and lower among men ( θ *0,5 = 0.53). This is the opposite for the feelings of sadness, feeling of failure, dissatisfaction, punishment, loss of libido, crying and fatigability. For instance, the loss of libido has a higher chance of being observed in the lowest intensity levels of depressive symptoms (b^1,21=0.48) among women and higher ones among men ( θ *0,5 = 1.50). Still as a result of model 2, it is estimated that the probability of a man with high intensity of depressive symptoms to express symptoms related to sadness, feelings of failure, dissatisfaction, guilt, punishment, crying, fatigability and loss of libido is higher or equal to 88% (ŷi ≥ 0.88). Figure 1 show the ICCs produced by models 1 and 2 for item 21, regarding loss of libido. Here, it is evident the advantage of the use of model 2 in relation to model 1, since differences in behavior in a DIF item, in relation to their severity, is clearly shown for both compared groups.

Figure 1.
Item Characteristic Curve (ICC) for the symptom loss of libido (item 21) according to the 2-parameter logistic model (1) and for the Embarrassing Item model (2) for male and female gender.

The depressive symptoms levels estimated under the IRT models are in the same symptom severity scale estimated for each BDI item; therefore, they are comparable. The 95th percentiles of the depressive symptoms intensity levels are 1.598 and 1.593 for models 1 and 2, respectively. From the 201 individuals with depressive symptoms severity higher than 95th percentile for each model, 194 are classified equally by both models. The characteristics (Table 3) of this group show that almost 80% are derived from the psychiatric group, approximately 68% of them are women, most of them (over 58%) do not have a partner and they are, on average, 37 years old.

Table 3.
Description of individuals with high level of depressive symptoms, estimated as a value above the 95th percentile.

The depressive symptoms intensity estimatives obtained according to models 1 and 2 present high association, with correlation coefficient equal to 0.99.

DISCUSSION

We used the two-parameter logistic model (model 1) in order to compare it to the model for embarrassing items (model 2) because both of them include the parameters for discrimination and depressive symptoms severity. Other studies have already used the 2-parameter logistic model for psychiatric data: Schaeffer1616. Schaeffer NC. An application of item response theory to the measurement of depression. Sociol Methodol 1988; 18: 271-307., in 1988, adjusted this model to the response for 11 depression symptoms for which there are 4 categories of answers ("never", "once up until now", "relatively often" and "many times") and Kessler et al.1717. Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SLT, et al. Short screening scales to monitor population prevalence and trends in non-specific psychological distress. Psychological Medicine 2002; 32: 959-76. used it in building 2 scales (one of them with 10 items and the other one with 6) on mental health.

The findings regarding model 1, as for the presence of DIF in eight BDI items, show that men and women with the same depressive symptoms intensity responded differently to the items sadness, feeling of failure, dissatisfaction, guilt, punishment, crying, fatigability and loss of libido. Several studies44. Hammen CL, Padesky CA. Sex differences in the expression of depressive responses on the Beck Depression Inventory. J Abnorm Psychol 1977; 86(6): 609-14.

5. Santor D, Ramsay J, Zuroff D. Nonparametric item analyses of the Beck Depression Inventory: evaluating gender item bias and response option weights. Psychol Assess 1994; 6: 255-70.
-66. Salokangas RK, Vaahtera K, Pacriev S, Sohlman B, Lehtinen V. Gender differences in depressive symptoms. An artefact caused by measurement instruments? J Affect Disord 2002; 68(2-3): 215-20.,1818. Romans SE, Tyas J, Cohen MM, Silverstone T. Gender differences in the symptoms of major depressive disorder. J Nerv Ment Dis 2007; 195(11): 905-11.

19. Stommel M, Given BA, Given CW, Kalaian HA, Schulz R, McCorkle R. Gender bias in the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D). Psychiatry Res 1993; 49(3): 239-50.

20. Wilhelm K, Parker G, Asghari A. Sex differences in the experience of depressed mood state over fifteen years. Soc Psychiatry Psychiatr Epidemiol 1998; 33(1): 16-20.

21. Carter JD, Joyce PR, Mulder RT, Luty SE, McKenzie J. Gender differences in the presentation of depressed outpatients: a comparison of descriptive variables. J Affect Disord 2000; 61(1-2): 59-67.

22. Gelin M, Zumbo B. Differential item functioning results may change depending on how an item is scored: an ilustration with the Center for Epidemiologic Studies Depression Scale. Educ Psychol Meas 2003; 63: 65-74.

23. Wenzel A, Steer RA, Beck AT. Are there any gender differences in frequency of self-reported somatic symptoms of depression? J Affect Disord 2005; 89(1-3): 177-81.
-2424. Angst J, Gamma A, Gastpar M, Lepine JP, Mendlewicz J, Tylee A; Depression Research in European Society Study. Gender differences in depression. Epidemiological findings from the European DEPRES I and II studies. Eur Arch Psychiatry Clin Neurosci 2002; 252(5): 201-9. corroborate these findings; however, the different functioning (DIF) of the item crying in relation to gender is what is observed in most of them. A good part of the studies which show the gender difference in relation to crying emphasizes that women tend to cry more often than men55. Santor D, Ramsay J, Zuroff D. Nonparametric item analyses of the Beck Depression Inventory: evaluating gender item bias and response option weights. Psychol Assess 1994; 6: 255-70.,66. Salokangas RK, Vaahtera K, Pacriev S, Sohlman B, Lehtinen V. Gender differences in depressive symptoms. An artefact caused by measurement instruments? J Affect Disord 2002; 68(2-3): 215-20.,2121. Carter JD, Joyce PR, Mulder RT, Luty SE, McKenzie J. Gender differences in the presentation of depressed outpatients: a comparison of descriptive variables. J Affect Disord 2000; 61(1-2): 59-67.. This may be another reflex of the well known tendency of women crying more easily and intensely than men in a variety of anguishing situations rather than being an indicator of gender difference in the prevalence of depression1818. Romans SE, Tyas J, Cohen MM, Silverstone T. Gender differences in the symptoms of major depressive disorder. J Nerv Ment Dis 2007; 195(11): 905-11.. This conclusion suggests crying as a response for anguish is, mostly, determined by gender; therefore, men and women with the same intensity level of depressive symptoms will probably not answer to the item crying in the same way, which is confirmed in this study. Originally, the BDI scale has four categories, considering that, specially on crying, the higher importance category states that the individual lost their ability to cry, even if they feel like it, while the first three categories determine an increase in the number of times they are used to crying. Of all the men who got a 1 in the dichotomous scale, over half of them answered category 3, the same occurring when observing only men in the group of 5% higher estimated levels for the depressive symptoms intensity, showing they are serious candidate to a positive diagnosis on depression. This loss of the capacity of crying by men is also present in the study by Hammen and Padesk44. Hammen CL, Padesky CA. Sex differences in the expression of depressive responses on the Beck Depression Inventory. J Abnorm Psychol 1977; 86(6): 609-14., in which the BDI is worked in its original scale.

When comparing the result found for models 1 and 2 in relation to the discrimination on depressive symptoms by the items, it is possible to notice that, considering items with values of ai ≥ 188. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70.,1010. Andrade DF, Tavares HR, Valle RC. Teoria da Resposta ao Item: conceitos e aplicações. In: Anais do 14º SINAPE; 2000 jul 28; Caxambu (MG). as having reasonable discrimination, the same 19 items in the two models are in this category, except only for loss of weight and self reproaching. In the study of Cúri et al.88. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70., in which a three-parameter logistic model was adjusted, only the loss of appetite had an estimative below this cutoff point, however, the symptom of weight loss is very close to this region. On the other hand, the most discriminated symptoms, feeling of failure and dissatisfaction, are the ones present in models 1 and 2 and in the one adjusted by Cúri et al.88. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70., showing that these are important symptoms in the discrimination of population for the intensity of their depressive symptoms.

A result shown in model 2 was the greater severity on the symptom loss of libido for men rather than for women, since there is a higher probability of its occurrence in higher levels of depressive symptoms intensity for men than for women. The importance of loss of libido for men is shown in several studies. In a clinical randomized trial on the sexual effects (such as improvement in loss of libido and erectile dysfunction) of testosterone replacement in men diagnosed with deeper depression2525. Seidman SN, Roose SP. The sexual effects of testosterone replacement in depressed men: randomized, placebo-controlled clinical trial. J Sex Marital Ther 2006; 32(3): 267-73., the authors intended to verify whether the treatment would be efficient in this population the same way it is in general population. However, the testosterone replacement did not have the expected known effect, indicating that maybe the problem was the condition of the depression in the target population.

The groups formed by the 5% of individuals with higher estimative of depressive symptoms intensity (latent trait being measured), obtained from models 1 and 2, evidences female superiority in the psychiatric group, since over 75% of these groups is formed by women. These data are consistent to the evidence that depression is twice to three times more common among adolescent and adult women than it is among adolescent and adult men2626. Beyer JL, Nash J, Shelton R, Loosen PT. Transtorno depressivo maior. In: Jorge MR. Manual diagnóstico e estatístico de transtornos mentais. 4ª edição. Porto Alegre: Artmed; 2000. p. 288-324., because these women have higher levels of depressive symptoms intensity, being strong candidates for having a positive depression diagnosis.

It is important to emphasize that models 1 and 2 track basically the same individuals as belonging to these groups with the highest estimatives on intensity of depressive symptoms. From 201, only 7 women and 7 men had disagreeing classifications, considering that model 1 tracks more women above the 95th percentile and model 2 tracks more men above their respective 95th percentile. These differences seem to occur due to the fact that the intensity levels of the estimated depressive symptoms for these individuals are at the limits of their respective 95th percentile.

CONCLUSION

Two IRT models were adjusted to the dichotomous BDI data: the 2-parameter logistic model (model 1) and the IRT model for embarrassing items (model 2), which includes the presence of DIF items.

The results found in models 1 and 2 are quite similar, especially in the case of the estimatives on the intensity of depressive symptoms for each individual, proved by the high correlation between the IRT scores. Despite that, model 2 is still better, since it shows the differences in the severity of depressive symptoms in the evaluated groups, bringing, this way, more information to the researcher on the studied population. The use of a model with wider reach in terms of target population may be a very useful alternative also in the clinical field, where the existence of validated models may contribute in the identification of individuals as potentially depressed.

A limitation of this work is that it consists of an empiric comparison, being necessary a broader study, using, for example, simulated data.

Still, as commented by Cúri et al.88. Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70., it is necessary the extension of model 2 to items with ordinal responses, since, as well as the BDI, countless instrument of psychiatric measures have items of ordinal responses, and their transformation in dichotomous items (absence or presence, for example) do not make complete usage of the available information, and possibly producing inconsistent results.

REFERÊNCIAS

  • 1
    Teresi JA, Fleishman JA. Differential item functioning and health assessment. Qual Life Res 2007; 16(Suppl 1): 33-42.
  • 2
    Embretson SE, Reise SP. Item Response Theory for Psychologists. New Jersey: Lawrence Erlbaum Associates; 2000.
  • 3
    Lord F. Applications of item response theory to practical testing problems. Hillsdale: Routledge; 1980.
  • 4
    Hammen CL, Padesky CA. Sex differences in the expression of depressive responses on the Beck Depression Inventory. J Abnorm Psychol 1977; 86(6): 609-14.
  • 5
    Santor D, Ramsay J, Zuroff D. Nonparametric item analyses of the Beck Depression Inventory: evaluating gender item bias and response option weights. Psychol Assess 1994; 6: 255-70.
  • 6
    Salokangas RK, Vaahtera K, Pacriev S, Sohlman B, Lehtinen V. Gender differences in depressive symptoms. An artefact caused by measurement instruments? J Affect Disord 2002; 68(2-3): 215-20.
  • 7
    Beck AT, Steer RA. Beck Depression Inventory. Manual. San Antonio, TX: Psychological Corporation; 1993.
  • 8
    Cúri M, Singer JM, Andrade DF. A model for psychiatric questionnaires with embarrassing items. Stat Methods Med Res 2001; 20(5): 451-70.
  • 9
    Cunha JA. Manual da versão em português das Escalas Beck. São Paulo: Casa do Psicólogo; 2001.
  • 10
    Andrade DF, Tavares HR, Valle RC. Teoria da Resposta ao Item: conceitos e aplicações. In: Anais do 14º SINAPE; 2000 jul 28; Caxambu (MG).
  • 11
    Teresi JA, Ocepek-Welikson K, Kleinman M, Cook KF, Crane PK, Gibbons LE, et al. Evaluating measurement equivalence using the item response theory log-likehood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measure of physical functioning ability and general distress. Qual Life Res 2007; 16(Suppl 1): 43-68.
  • 12
    Thissen D. Dave Thissen's Front Page. Disponível em www.unc.edu/~dthissen/dl.html. (Acessado em 26 de julho de 2008).
    » www.unc.edu/~dthissen/dl.html
  • 13
    Samejima F. Estimation of latent ability using a response pattern of graded scores. Madison (WI): Psychometric Society; 1969.
  • 14
    Lunn DJ, Thomas A, Best N, Spiegelhalter D. Winbugs - a Bayesian modeling framework: concepts, structure, and extensibility. Stat Comput 2000; 10: 325-37.
  • 15
    Castro SMJ, Trentini C, Riboldi J. Teoria da resposta ao item aplicada ao Inventário de Depressão Beck. Rev Bras Epidemiol 2010; 13(3): 487-501.
  • 16
    Schaeffer NC. An application of item response theory to the measurement of depression. Sociol Methodol 1988; 18: 271-307.
  • 17
    Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SLT, et al. Short screening scales to monitor population prevalence and trends in non-specific psychological distress. Psychological Medicine 2002; 32: 959-76.
  • 18
    Romans SE, Tyas J, Cohen MM, Silverstone T. Gender differences in the symptoms of major depressive disorder. J Nerv Ment Dis 2007; 195(11): 905-11.
  • 19
    Stommel M, Given BA, Given CW, Kalaian HA, Schulz R, McCorkle R. Gender bias in the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D). Psychiatry Res 1993; 49(3): 239-50.
  • 20
    Wilhelm K, Parker G, Asghari A. Sex differences in the experience of depressed mood state over fifteen years. Soc Psychiatry Psychiatr Epidemiol 1998; 33(1): 16-20.
  • 21
    Carter JD, Joyce PR, Mulder RT, Luty SE, McKenzie J. Gender differences in the presentation of depressed outpatients: a comparison of descriptive variables. J Affect Disord 2000; 61(1-2): 59-67.
  • 22
    Gelin M, Zumbo B. Differential item functioning results may change depending on how an item is scored: an ilustration with the Center for Epidemiologic Studies Depression Scale. Educ Psychol Meas 2003; 63: 65-74.
  • 23
    Wenzel A, Steer RA, Beck AT. Are there any gender differences in frequency of self-reported somatic symptoms of depression? J Affect Disord 2005; 89(1-3): 177-81.
  • 24
    Angst J, Gamma A, Gastpar M, Lepine JP, Mendlewicz J, Tylee A; Depression Research in European Society Study. Gender differences in depression. Epidemiological findings from the European DEPRES I and II studies. Eur Arch Psychiatry Clin Neurosci 2002; 252(5): 201-9.
  • 25
    Seidman SN, Roose SP. The sexual effects of testosterone replacement in depressed men: randomized, placebo-controlled clinical trial. J Sex Marital Ther 2006; 32(3): 267-73.
  • 26
    Beyer JL, Nash J, Shelton R, Loosen PT. Transtorno depressivo maior. In: Jorge MR. Manual diagnóstico e estatístico de transtornos mentais. 4ª edição. Porto Alegre: Artmed; 2000. p. 288-324.
  • Financing source: none.

Publication Dates

  • Publication in this collection
    Mar 2015

History

  • Received
    13 Dec 2012
  • Reviewed
    29 Mar 2013
  • Accepted
    05 June 2013
Associação Brasileira de Saúde Coletiva Av. Dr. Arnaldo, 715 - 2º andar - sl. 3 - Cerqueira César, 01246-904 São Paulo SP Brasil , Tel./FAX: +55 11 3085-5411 - São Paulo - SP - Brazil
E-mail: revbrepi@usp.br