The validity and 4-year test-retest reliability of the Brazilian version of the Eating Attitudes Test-26

In a cross-sectional study conducted four years ago to assess the validity of the Brazilian version of the Eating Attitudes Test-26 (EAT26) for the identification of abnormal eating behaviors in a population of young females in Southern Brazil, 56 women presented abnormal eating behavior as indicated by the EAT-26 and the Edinburgh Bulimic Investigation Test. They were each matched for age and neighborhood to two normal controls (N = 112) and were re-assessed four years later with the two screening questionnaires plus the Composite International Diagnostic Interview (CIDI). The EAT results were then compared to diagnoses originating from the CIDI. To evaluate the temporal stability of the two screening questionnaires, a test-retest design was applied to estimate kappa coefficients for individual items. Given the prevalence of eating disorders of 6.2%, the CIDI psychiatry interview was applied to 161 women. Of these, 0.6% exhibited anorexia nervosa and 5.6%, bulimia nervosa (10 positive cases). The validity coefficients of the EAT were: 40% sensitivity, 84% specificity, and 14% positive predictive value. Cronbach’s coefficient was 0.75. For each EAT item, the kappa index was not higher than 0.344 and the correlation coefficient was lower than 0.488. We conclude that the EAT-26 exhibited low validity coefficients for sensitivity and positive predictive value, and showed a poor temporal stability. It is reasonable to assume that these results were not influenced by the low prevalence of eating disorders in the community. Thus, the results cast doubts on the ability of the EAT-26 test to identify cases of abnormal eating behaviors in this population. Correspondence


Introduction
Several scales have been proposed for the study of disordered eating (1)(2)(3)(4)(5) including the Eating Attitudes Test (EAT-40), developed for the early detection of anorexia nervosa by Garner and Garfinkel in 1979 (6).
The original EAT-40 was designed as a self-report questionnaire focusing on eating attitudes and behavior. The abbreviated version (EAT-26) was derived from a factor analysis of the EAT-40, with three main factors: dieting (avoidance of fattening foods and preoccupation with thinness), bulimia and food preoccupation (thoughts about food and bulimia), and oral control (self-control about food and social pressure to gain weight) (7). The EAT-26 correlated well with EAT-40 (r = 0.98).
According to the authors of the instrument, the EAT-40 was validated when the test was initially administered to two samples of patients with anorexia nervosa (AN) from the Clark Institute of Psychiatry in Toronto, Canada (N = 32; N = 33). All patients fulfilled diagnostic criteria for AN according to Feighner et al. (17) although they were in different phases of the disease. They were compared to two control groups of university students (N = 34; N = 59) from the University of Toronto. The final EAT-40 version was also administered to a sample of men (N = 49) and to a sample of obese individuals (N = 16).
Based on their results, the authors concluded that the EAT-26 was an adequate measure for the identification of potential cases of AN in populations at risk (including good sensitivity and specificity) (7). Typical results are those of Mann et al. (14) who found that a threshold of 20 (on the EAT-26) yielded 88% sensitivity and 96% specificity.
EAT-26 has been applied to a variety of cultural and age groups in more than 250 studies conducted on clinical and community samples (18). For the present study, we conducted a literature review to identify all EAT studies fulfilling the following criteria: a) used in a community sample, b) standardized criteria for the diagnosis of anorexia and/or bulimia were adopted, and c) the study provided data to estimate the validity coefficients. Nine studies fulfilled these criteria (8)(9)(10)(11)(12)(13)(14)(15)(16) and all of them were conducted on samples of populations at risk for the development of eating disorders. A summary of these studies can be seen in Table 1 (except for the Eisler and Szmukler study  (10) where it was possible only to estimate the positive predictive value). The results showed sensitivity ranging from 28 to 100%, specificity ranging from 89 to 97% and positive predictive value ranging from 4 to 55%.
In general, it can be concluded that the test has low sensitivity and is weak in terms of positive predictive values. The Brazilian Portuguese version of the EAT-26 (19) retained the structure and content of the original questionnaire, but its psychometric performance in the Brazilian population had not been evaluated previously. Thus, the aim of the present study was to assess the validity of the EAT-26 for identifying abnormal eating behaviors in a population-based sub-sample of young women living in the urban area of Porto Alegre, Brazil.

Cross-sectional study
Data collection was carried out in two stages. As previously described (20), in the first stage (1998) the prevalence of abnormal eating behaviors was investigated in a representative sample of 513 women aged 12-29 years from the city of Porto Alegre, Southern Brazil (population of 1.5 million). For the purposes of the study, it was determined that, on the basis of demographic data (21), 1524 households should be visited. All women aged 12-29 years living in those households were invited to participate (N = 555). Of those, 20 (3.6%) refused to participate and 22 (3.9%) could not be found after at least six visits. Thus, 513 women were enrolled. Written informed consent was obtained from all participants. The study was approved by the Hospital Ethics Committee in São Paulo.
Twelve interviewers were trained to collect demographic and socioeconomic information. Anthropometric measurements were performed in a standardized fashion. Weight was measured in light clothing to the nearest 0.1 kg and height was measured without shoes using an aluminum anthropometer. According to WHO recommendations, weight status was categorized into thin (body mass index (BMI) < 5 percentile), normal (5 percentile ≤ BMI < 85 percentile) and overweight/obese (BMI ≥ 85 percentile) (22).
The prevalence of abnormal eating behaviors and inadequate methods of weight control was assessed using the Brazilian Portuguese version of the Eating Attitudes Test (EAT-26) (7,19) and the Brazilian Portuguese version of the Bulimic Investigation Test of Edinburgh (BITE) (4,23). In the EAT-26 each item is answered on a 6-point Lickert scale, yielding scores from 3 to 0 that decrease towards absence of the disorder. Total scores can range from 0 to 78, and values of 21 or more correspond to a probable case of AN.
Given the complementary nature of these two tests, operational definitions based on both instruments were devised to categorize the sample into three groups. For this, scores above the cutoff point on the BITE Symptom Scale and/or BITE Severity Scale were considered to be indicative of greater severity than scores above the cut-off point on the EAT-26 as follows: Category 1 -Likelihood of presenting abnormal eating behavior (LAEB): BITE scale of symptoms ≥20 and/or BITE scale of severity >4; Category 2 -Unusual eating patterns: BITE scale of symptom between 10 and 19, or BITE scale of severity ≤4, and/or EAT-26 ≥21.
Of the 513 women included in this stage, 56 were identified as having LAEB according to the classification described above.

Follow-up cohort study
The second stage of the study took place 4 years later in 2002. For this second stage, we tried to contact the 56 women diagnosed with LAEB in the first stage. In addition, for each LAEB woman, we tried to identify two age-and neighborhood-matched controls not diagnosed with LAEB in the original study (112 probable non-cases). This sampling method was chosen in order to minimize problems related to the low prevalence of eating disorders in community samples. As a result of this sub-sample selection process, the prevalence of LAEB was overestimated (>30%). A thorough search was performed to locate the 168 women selected for this sub-sample. First, the original 1998 address was visited. Several women had moved: 36 of them to a different city in the same state, 10 to a different state, and two to a foreign country. Several women had also changed their last name after getting married.
Search strategies included contacting relatives, visiting addresses indicated by neighbors, and work addresses at the voter's registry. In the case of women living in different towns in Brazil, the interviewers traveled to those cities. The two women living abroad were contacted via e-mail and invited to participate in this new stage of the study. As in the first stage, individuals answered the EAT-26 and the BITE as well as demographic and socioeconomic questionnaires. In addition, they answered the Composite International Diagnostic Criteria Interview (CIDI) (24) which was administered by interviewers blinded to the scores of the screening questionnaires. All questionnaires were administered orally by trained interviewers due to the low schooling level of some participants.

Statistical analysis
The t-test for independent samples was used to compare the means of the qualitative variables of the groups of women with and without abnormal eating behavior. The chisquare test was used to determine the association between inadequate eating behavior and qualitative sociodemographic variables. Cronbach's alpha coefficient (25) was used to measure the internal consistency of EAT-26. Three forms of the kappa coefficient (25), kappa, kappa with dichotomized items, and balanced kappa, were used to measure the temporal stability of EAT-26.

Results
Two LAEB women were not located at follow-up and three women refused to participate in the second stage (two LAEBpositive and one control). Therefore, 3% of the sample was lost and 163 women were studied. The two individuals (1.2%) living abroad completed all the questionnaires via e-mail, except for the CIDI (applied to 161 women) and their weight was not measured.
The sociodemographic characteristics and EAT-26 results of cases and controls are described in Table 2. No significant differences were observed between groups for any of the demographic features studied. Mean age (± SD) was 24.2 ± 4.0 years. Thirty percent were studying and working and 40.5% were only working at the time of data collection; 52% had more than 12 years of schooling. Participants came from families with an average monthly income of U$200.
The mean BMI (current weight measured in kg divided by height in m 2 ) of 161 women was 23.3 (SD = 4.6). BMI was within the normal range (18.5-24.9) in 70.7% of the women studied, within the underweight range (BMI <18.5) in 4%, and within the overweight range (BMI >24.9%) in 25.5%. LAEB women had significantly higher BMI than controls (LAEB = 25.2; controls = 22.4; P < 0.01). Perceived ideal weight was also significantly higher in LAEB women than in controls (LAEB = 57.5; controls = 54.6; P < 0.008), but the difference between actual and ideal weight was higher in LAEB women than in controls (LAEB = 8.9; controls = 4.0; P < 0.001). Mean height was 1.62 cm in both groups.
CIDI results showed that 0.6% of the women in the sample were diagnosed as having AN and 5.6% as having bulimia nervosa (10 cases with a diagnosis of an eating disorder). Comparisons of the results from the EAT-26 (cut-off point ≥21) and from the psychiatric interview (concurrent validity) can be found in Table 3. The sensitivity of the EAT-26 was 40% and the specificity 84%. The positive predictive value was 14% and the misclassification rate was 18.4%. Of the 10 women diagnosed by the CIDI, four (40%) presented positive scores in the EAT-26 (true positives). Of the total of 151 women without a CIDI diagnosis of an eating disorder, 24 (15.7%) presented positive scores in the EAT-26 (false-positives). When the results of the CIDI were compared with the results of the EAT-26 from the 1998 cross-sectional study (1st stage), it was found that the EAT-26 did not predict any of the 10 cases diagnosed by the CIDI. Besides, 31 (20.3%) of the 151 women without a CIDI diagnosis of an eating disorder had positive EAT-26 scores. The validity coefficients of the EAT-26 at three different cut-off points can be found in Table 4. None of the results suggested an adequate balance between sensitivity and specificity, and both sensitivity (30 to 40%) and positive predictive values were very low (14 to 27%).
Cronbach's alpha coefficient of the subsample was 0.75, indicating that the questionnaire presents acceptable inner consistency. Table 5 shows a summary of the kappa, kappa with dichotomized items and balanced kappa and the correlation coefficient with the measurements obtained in the cross-sectional study (1st stage) compared to the subsample (2nd stage) 4 years later. As can be seen in Table 5, the three best results of the kappa estimates were obtained for items 1 (0.343), 5 (0.344) and 8 (0.341). For each of the EAT items, the kappa coefficients were not higher than 0.344. Moreover, using the dichotomized kappa and the balanced kappa index for each of the EAT-26 items, values were not higher than 0.461 and 0.510, respectively.

Discussion
The purpose of the present study was to determine whether the EAT-26 applied to a Brazilian population-based sample presented validity estimates similar to those obtained in previous studies. The main findings of our study led us to conclude that the EAT-26 exhibited low validity coefficients for sensitivity and positive predictive values, and poor temporal stability.
The EAT-26 is a screening questionnaire designed to identify abnormal eating behav-iors, including AN and bulimia nervosa, diseases with low prevalence rates in the community. It is well known that validity studies aiming to detect low prevalence diseases will present a low positive predictive value (26,27).
The validity studies were conducted on samples of populations considered to be at higher risk for eating disorders, in which the possibility to detect disease cases was elevated (9)(10)(11)(12)14,16). This could consequently yield a higher positive predictive value. All studies were based on samples from specific groups, such as students, patients from clinics, and patients with eating disorders compared to controls. In 1982, when Garner et al. (7) validated the test, they used equal sample sizes of AN and control individuals and the results obtained were generalized to other populations. Williams et al. (26), however, emphasized that it was not appropriate to generalize EAT results obtained from specific samples to the community. Therefore, it becomes questionable whether the EAT would be adequate as a screening questionnaire for population-based samples.
A test can be used with three different populations: 1) people with a high chance of presenting the disease (visibly ill), 2) those whose chance is minimal (visibly healthy), or 3) those who present an intermediate risk to develop the disease. Applying these tests in extreme situations (first and second groups) results in a high prevalence of the disease and the possibility of satisfactory validity estimation of the test. However, according to Sackett et al. (27), recognizing those individuals who do not clearly present the disease (as is the case for non-specific inadequate eating behaviors) is a difficult task.
In order to reduce these effects and to accurately estimate the EAT validity, the present study used specific sampling methods to increase the proportion of probable disease cases in a population-based sample. Through the sub-sample selection process, an overestimated prevalence of LAEB of 30% was obtained. The prevalence of eating disorders according to the CIDI was 6.2% (10 cases). In the concurrent validity analysis of the test, only 40% of positive individuals in the EAT-26 were diagnosed as having AN or bulimia nervosa by the CIDI (4 cases), indicating that the questionnaire has low sensitivity. None of the individuals who presented an EAT ≥21 (N = 28) were diagnosed as having AN. Of the 10 cases diagnosed by the interview, 6 did not present a positive EAT score (false-negatives) and there were 24 false-positives, resulting in a low positive predictive value. Therefore, the positive predictive value was low despite a true prevalence of 6.2%. There are three main possible reasons to explain the high number of false-positives found in this validity study: a) it is likely that the current cultural pattern of slenderness may be responsible for the high score in many EAT items. Some items reflect eating practices that have become quite common. It is difficult to find individuals who are not often on a diet and do not control their sugar intake. Twenty years ago, these behaviors were not as frequent as they are today and it is highly possible that this shift towards a greater emphasis on thinness increased the possibility of positive answers and higher scores on the EAT; b) possible misclassification based on social and cultural factors. Mari and Williams (28) reported the importance of evaluating the effects of social and demographic factors when estimating the validity of a screening test. A high number of individuals with a low educational level may have influenced positive answers to the EAT-26, and c) the time interval between the two phases of the study was too long and there was an actual change in subjects' eating behaviors and attitudes.
Considerable difficulties remain in the use of standardized assessments such as the CIDI to diagnose eating disorders. Besides a few aspects of the eating disorder itself, such as ego-syntonic nature of the illness, secrecy and denial, which may be part of the reason why it is difficult to use scales, the CIDI questions also do not appropriately evaluate the body image distortion, do not distinguish subjective and objective binge eating, and do not mention "loss of control", which are major criteria in the diagnosis of eating disorder. The CIDI questions appear to be in need of further refinement (29).
Comparisons of the EAT scores from the two study periods suggest that the questionnaire has low temporal stability. Three forms of the kappa coefficient, kappa, kappa with dichotomized items and balanced kappa, were used to best measure the temporal stability of EAT-26. The instrument includes items that are not clearly formulated, which probably do not measure what they were designed to measure, and are easily misunderstood. For instance, for the item -I avoid carbohydrates -there was zero agreement between the two periods. It is possible that subjects under study did not understand the word carbohydrate. Likewise, the item -control over feeding -showed zero agreement.
It is reasonable to assume that these results were not influenced by the low prevalence of eating disorders in the community, casting doubts on the real ability of the test to identify abnormal eating behaviors in the population.