Reproducibility of a food frequency questionnaire for adolescents

To assess the reproducibility of a validated 76item food frequency questionnaire designed to estimate diet in adolescents (Adolescent Food Frequency Questionnaire – AFFQ) in the city of Sao Paulo, Brazil, a test-retest study was conducted (n = 49). Intraclass correlation coefficient (ICC), weighted kappa, and percentage of agreement were used in both crude and energy-adjusted nutrient intakes. Bland Altman plots were used to examine the limits of agreement for energy and macronutrients. The ICC ranged from 0.48 (carbohydrates) to 0.65 (vitamin C) in crude values and from 0.25 (total fat) to 0.58 (vitamin C) in adjusted values. Kappa values ranged from 0.28 (protein and fiber) to 0.56 (unsaturated fat). Bland Altman showed a trend towards larger difference in energy according to increased intake values and a bias towards extreme values for fat intake. The percent of individuals classified in the same category on the two occasions was on average 54.2%. By conclusion, the Adolescent Food Frequency Questionnaire showed reasonable reproducibility and can be used in studies that aim to classify groups into intake categories. Adolescent; Food Habits; Questionnaires; Reproducibility of Results Introduction Diet is a relevant factor not only for growth and development, but also for the present and future health of adolescents 1,2,3. There is increasing evidence that diet in childhood and adolescence is related to diseases of adulthood like heart disease, osteoporosis, and cancer 4. Therefore, the evaluation and quantification of adolescents’ habitual diet are of great concern. To meet both monitoring and nutritional research aims, a reliable instrument is needed that accurately and precisely assesses this group’s dietary intake 5. The use of food frequency questionnaires has been increasing in epidemiological studies among adolescents, since beginning in adolescence, cognitive processes are considered to be similar to those of adults 3,6,7,8,9. However, inaccurate diet measurements may obscure the relationship between diet and the target outcome 10. It is well known that errors are inherent to any assessment of dietary status, so studies for validating and investigating the reproducibility of measurement instruments need to be considered, since they help understand the relationship between measured and true values 10. Such studies have the aim of increasing accuracy and reducing measurement bias 11. Reproducibility studies assess the extent to which results agree when obtained by the same approach at different points in time 12. A food frequency questionnaire should reproduce similar results when applied ARTIGO ARTICLE Marchioni DML et al. 2188 Cad. Saúde Pública, Rio de Janeiro, 23(9):2187-2196, set, 2007 under similar circumstances, over a given time interval, and with the same participants 10,13,14. The objective of this study was to assess the reproducibility of a validated 76-item food frequency questionnaire for assessing habitual diet in adolescents (Adolescent Food Frequency Questionnaire – AFFQ) in the city of São Paulo, Brazil. Material and methods This was a reproducibility study using a test-retest design. Forty-nine high school sophomores (secondyear students), 25 boys and 24 girls (ages 16-19 years) at a public school located on the West Side of the city of São Paulo were invited and agreed to participate in the study. The AFFQ was administered twice, in May and August of the same year, 2001. On both occasions, students were asked to complete the questionnaire in the classroom, with the presence of a field researcher to help ensure that the questionnaires were filled out under the same conditions. Adolescent Food Frequency Questionnaire The instrument was designed to assess the amounts of foods consumed over the preceding six-month period. The development and validation of the AFFQ were described elsewhere 15. In brief, a list of 76 foods or food groups that contributed to at least 90% of energy and macronutrient intake by adolescents was compiled, based on dietary data from 24h food recall 16. Portion sizes were obtained from the same dataset 16. The questionnaire was self-administered, and students were asked to estimate the average frequency of their intake for each food or food group over the previous six months by selecting one of the seven categories: never, less than once a month, one to three times a month, once a week, two to four times a week, once a day, twice or more a day. Nutrient intakes were calculated by multiplying the portion weight by its nutrient content. Specific software – Virtual Nutri version 1.0 (Faculdade de Saúde Pública, Universidade de São Paulo, São Paulo, Brasil) – was used to analyze the energy and nutrient content of food intake. Here, food nutritional values were compiled either from Brazilian tables, food composition analyses furnished by manufacturers, or for foods unavailable in Brazilian databases, food composition tables from other countries, for example the United States Department of Agriculture (USDA) 17. In the validation study, the instrument showed good accuracy (crude mean r = 0.57) 16. Initially, we chose a list of nutrients to examine the reproducibility. It included the macronutrients and nutrients related to frequently reported low intake in this young population, such as iron and calcium, and common diseases in youth (anemia) and adulthood (cardiovascular diseases, cancer, and osteoporosis). Statistical analysis Nutrient intakes were log-transformed to improve normality. Most of the nutrients measured by the AFFQ presented high correlations with energy. It was therefore necessary to remove this variation because of its association with measurement error, and this was done by means of the residuals methods proposed by Willet & Stampfer 18. Intraclass correlation coefficients (ICC) were utilized to assess the reproducibility of the dietary intake reported by the individuals in the AFFQ test-retest. Differences between individual intakes reported by students in the first and second AFFQ were compared using Wilcoxon signed-rank test. The percentage concordance was determined by identifying the percentages of individuals classified in the same, adjacent, and opposite categories. Bland Altman plots were used to examine the limits of agreement for energy and macronutrients between AFFQ1 and AFFQ2 19. For each individual the difference in the log-transformed measured intake between the two AFFQs was plotted against their average intake from the same questionnaire 19. To simultaneously examine the contribution of gender to intake reproducibility, we estimated the partial correlations for total energy intake. Gender was not significantly related to reproducibility (r = 0.08, p = 0.589), so the analysis was conducted with no further gender distinction. To verify concordance of the ordinal data, i.e. in the categories given by intake in tertiles, weighted kappa was utilized. This was derived from the kappa coefficient, with weights based on the magnitudes of the observed discordance 20. The study was approved by the Research Ethics Committee of the School of Public Health, University of São Paulo, and the parents and adolescents provided informed consent. REPRODUCIBILITY OF A FOOD FREQUENCY QUESTIONNAIRE FOR ADOLESCENTS 2189 Cad. Saúde Pública, Rio de Janeiro, 23(9):2187-2196, set, 2007 Results Table 1 presents means and standard deviations of energy and nutrient intake in the AFFQ test and retest. For the majority of the nutrients assessed, the values found in the retest were lower than in the initial test. Statistical differences between AFFQ1 and AFFQ2 were observed for energy, carbohydrates, total fat, unsaturated fat, fiber, and iron. Intraclass correlation coefficients (ICC) between the two AFFQs ranged from 0.48 (carbohydrates) to 0.65 (vitamin C) in crude values (mean value 0.56) and from 0.25 (total fat) to 0.58 (vitamin C) in adjusted values (mean value 0.43), as shown in Table 2. All ICC values decreased after energy adjustment. Table 3 presents the results for the capacity of the AFFQ to classify individuals within the same tertile, adjacent tertile, and opposite tertile of nutrient intake, after adjusting for energy, between the test and retest. The proportion of individuals classified in the same tertile averaged 54.2%, ranging from 44.8% (protein) to 63.2% (vitamin C). The percentage of individuals classified in opposite tertiles ranged from 6.1% for unsaturated fat to 14.3% for carbohydrates, total fat, and cholesterol (mean value 11.3%). The values of the weighted kappa coefficient ranged from 0.31 for total fat and cholesterol to 0.56 for unsaturated fat (mean value 0.42). Bland Altman plots were used to further examine the limits of agreement between the two applications of the AFFQ, by plotting individual differences against the mean of the two questionnaires (Figure 1). The nutrients chosen for the analysis were energy, total fat, carbohydrate, and protein. The means for energy and macronutrients were all positive, indicating a bias towards overestimation in AFFQ1. The Pearson correlation coefficient between the differences and mean of the two questionnaires were all positive, but not statistically significant (energy r = 0.090, p = 0.538; protein r = 0.141, p = 0.333; carbohydrate r = 0.004, p = 0.981; lipid r = 0.231, p = 0.111).


Reproducibility of a food frequency questionnaire for adolescents
Reprodutibilidade de um questionário de freqüência alimentar para adolescentes Introduction Diet is a relevant factor not only for growth and development, but also for the present and future health of adolescents 1,2,3 .There is increasing evidence that diet in childhood and adolescence is related to diseases of adulthood like heart disease, osteoporosis, and cancer 4 .Therefore, the evaluation and quantification of adolescents' habitual diet are of great concern.To meet both monitoring and nutritional research aims, a reliable instrument is needed that accurately and precisely assesses this group's dietary intake 5 .
The use of food frequency questionnaires has been increasing in epidemiological studies among adolescents, since beginning in adolescence, cognitive processes are considered to be similar to those of adults 3,6,7,8,9 .However, inaccurate diet measurements may obscure the relationship between diet and the target outcome 10 .It is well known that errors are inherent to any assessment of dietary status, so studies for validating and investigating the reproducibility of measurement instruments need to be considered, since they help understand the relationship between measured and true values 10 .Such studies have the aim of increasing accuracy and reducing measurement bias 11 .Reproducibility studies assess the extent to which results agree when obtained by the same approach at different points in time 12 .A food frequency questionnaire should reproduce similar results when applied under similar circumstances, over a given time interval, and with the same participants 10,13,14 .
The objective of this study was to assess the reproducibility of a validated 76-item food frequency questionnaire for assessing habitual diet in adolescents (Adolescent Food Frequency Questionnaire -AFFQ) in the city of São Paulo, Brazil.

Material and methods
This was a reproducibility study using a test-retest design.
Forty-nine high school sophomores (secondyear students), 25 boys and 24 girls (ages 16-19 years) at a public school located on the West Side of the city of São Paulo were invited and agreed to participate in the study.The AFFQ was administered twice, in May and August of the same year, 2001.On both occasions, students were asked to complete the questionnaire in the classroom, with the presence of a field researcher to help ensure that the questionnaires were filled out under the same conditions.

Adolescent Food Frequency Questionnaire
The instrument was designed to assess the amounts of foods consumed over the preceding six-month period.The development and validation of the AFFQ were described elsewhere 15 .In brief, a list of 76 foods or food groups that contributed to at least 90% of energy and macronutrient intake by adolescents was compiled, based on dietary data from 24h food recall 16 .Portion sizes were obtained from the same dataset 16 .The questionnaire was self-administered, and students were asked to estimate the average frequency of their intake for each food or food group over the previous six months by selecting one of the seven categories: never, less than once a month, one to three times a month, once a week, two to four times a week, once a day, twice or more a day.Nutrient intakes were calculated by multiplying the portion weight by its nutrient content.Specific software -Virtual Nutri version 1.0 (Faculdade de Saúde Pública, Universidade de São Paulo, São Paulo, Brasil) -was used to analyze the energy and nutrient content of food intake.Here, food nutritional values were compiled either from Brazilian tables, food composition analyses furnished by manufacturers, or for foods unavailable in Brazilian databases, food composition tables from other countries, for example the United States Department of Agriculture (USDA) 17 .
In the validation study, the instrument showed good accuracy (crude mean r = 0.57) 16 .
Initially, we chose a list of nutrients to examine the reproducibility.It included the macronutrients and nutrients related to frequently reported low intake in this young population, such as iron and calcium, and common diseases in youth (anemia) and adulthood (cardiovascular diseases, cancer, and osteoporosis).

Statistical analysis
Nutrient intakes were log-transformed to improve normality.Most of the nutrients measured by the AFFQ presented high correlations with energy.It was therefore necessary to remove this variation because of its association with measurement error, and this was done by means of the residuals methods proposed by Willet & Stampfer 18 .
Intraclass correlation coefficients (ICC) were utilized to assess the reproducibility of the dietary intake reported by the individuals in the AFFQ test-retest.Differences between individual intakes reported by students in the first and second AFFQ were compared using Wilcoxon signed-rank test.
The percentage concordance was determined by identifying the percentages of individuals classified in the same, adjacent, and opposite categories.
Bland Altman plots were used to examine the limits of agreement for energy and macronutrients between AFFQ1 and AFFQ2 19 .For each individual the difference in the log-transformed measured intake between the two AFFQs was plotted against their average intake from the same questionnaire 19 .
To simultaneously examine the contribution of gender to intake reproducibility, we estimated the partial correlations for total energy intake.Gender was not significantly related to reproducibility (r = 0.08, p = 0.589), so the analysis was conducted with no further gender distinction.
To verify concordance of the ordinal data, i.e. in the categories given by intake in tertiles, weighted kappa was utilized.This was derived from the kappa coefficient, with weights based on the magnitudes of the observed discordance 20 .
The study was approved by the Research Ethics Committee of the School of Public Health, University of São Paulo, and the parents and adolescents provided informed consent.

Results
Table 1 presents means and standard deviations of energy and nutrient intake in the AFFQ test and retest.For the majority of the nutrients assessed, the values found in the retest were lower than in the initial test.Statistical differences between AFFQ1 and AFFQ2 were observed for energy, carbohydrates, total fat, unsaturated fat, fiber, and iron.
Intraclass correlation coefficients (ICC) between the two AFFQs ranged from 0.48 (carbohydrates) to 0.65 (vitamin C) in crude values (mean value 0.56) and from 0.25 (total fat) to 0.58 (vitamin C) in adjusted values (mean value 0.43), as shown in Table 2.All ICC values decreased after energy adjustment.
Table 3 presents the results for the capacity of the AFFQ to classify individuals within the same tertile, adjacent tertile, and opposite tertile of nutrient intake, after adjusting for energy, between the test and retest.The proportion of individuals classified in the same tertile averaged 54.2%, ranging from 44.8% (protein) to 63.2% (vitamin C).The percentage of individuals classified in opposite tertiles ranged from 6.1% for unsaturated fat to 14.3% for carbohydrates, total fat, and cholesterol (mean value 11.3%).The values of the weighted kappa coefficient ranged from 0.31 for total fat and cholesterol to 0.56 for unsaturated fat (mean value 0.42).
Bland Altman plots were used to further examine the limits of agreement between the two applications of the AFFQ, by plotting individual differences against the mean of the two questionnaires (Figure 1).The nutrients chosen for the analysis were energy, total fat, carbohydrate, and protein.The means for energy and macronutrients were all positive, indicating a bias towards overestimation in AFFQ1.The Pearson correlation coefficient between the differences and mean of the two questionnaires were all positive, but not statistically significant (energy r = 0.090, p = 0.538; protein r = 0.141, p = 0.333; carbohydrate r = 0.004, p = 0.981; lipid r = 0.231, p = 0.111).

Discussion
This paper reports on the reproducibility of a previously validated food frequency questionnaire, designed to estimate adolescents' habitual food intake.Our results indicate reasonable reproducibility for categorizing individuals according to levels of intake, the usual aim of epidemiological studies.
The mean kappa value was 0.42, thus indicating fair reliability 21 .This was similar to the find-ings by Hansson et al. 22 , who reported a median value of 0.40 for the weighted kappa coefficient.
The crude concordance ranged from 44.8% for protein to 63.2% for vitamin C, with a mean of 54.2% of individuals classified in the same tertile.The low percentage of individuals classified in opposite tertiles (11.3%) was noteworthy.Considering the large variability observed in adolescents' food intake 23 , this percentage of concordance is considered reasonable for reproducibility studies involving this type of instrument (food frequency questionnaire) and individuals at this stage of life 9 .The values obtained in the present study are similar to those found by Vereecken & Maes 9 in a study conducted in Belgium.
The correlations observed in this study are consistent with those from other studies among adolescents.In Rockett et al. 5 , the Pearson correlation coefficients ranged from 0.26 for protein and iron to 0.58 for calcium.The correlations observed by Speck et al. 24 ranged from 0.08 to 0.76, with a two-week interval between applications of the questionnaire.Field et al. 25 , assessing the performance of a frequency questionnaire applied with a one-year interval, found correlations from 0.18 to 0.47.Robison et al. 26 reported correlation coefficients ranging from 0.44 (protein) to 0.52 (fat).
The time frame is also of concern.The recommendation is that the replication interval should be neither too short (to the point that subjects are able to remember and reproduce the answers by means of a learning process) nor too long (in the sense that it may reflect true changes in diet).In our study, the AFFQ was repeated after a three-month interval, which is considered reasonable 8 .However, the time reference can also reflect changes in intake due to seasonality, which may have happened, lowering the true correlations, especially for nutrients in which fruits and vegetables are the main source, like vitamin C and retinol.
Bland Altman showed a trend towards larger difference in energy according to increased intake values and a bias towards extreme values for fat intake.Although these plots are also recommended to graphically investigate potential bias when comparing a method's repeatability, they have been used more extensively in diet validation studies than in reproducibility studies.
To produce reliable estimates of habitual intake by means of a food frequency questionnaire, a crucial ability is to deal with abstract concepts and form a mental image of the diet that is as close as possible to the truth 27 .According to Goodwin et al. 28 , children 13 to 17 years of age were capable of completing the food frequency questionnaire without their parents' help and to satisfactorily provide dietary information.Likewise, Field et al. 25 found better reproducibility in sixth and seventh-grade as compared to fourth and fifth-grade students (even if the students were the same age).According to Drewnowski 27 , the mental image of the habitual diet is also influenced by the foods that individuals like or dislike, thereby becoming a measurement of their attitude towards their own diet.This may introduce errors that increase the variability of the responses and produce lower correlations.Specific self-reported food intakes are generally higher in the first interview than in the second, as pointed out by McPherson et al. 29 .The current study was also consistent with this observation.Cullen & Zakeri 30 , in a reproducibility study that considered ethnicity, also found higher values in the first interview, as did Robinson et al. 26 in their study to assess a food frequency  Ln protein intake by AFFQ1 and 2 (g/day ) questionnaire in adolescent girls in the United Kingdom.Nevertheless, a reproducibility test by Rockett et al. 5 reported similar mean values for energy in the first and second interviews.Because of the correlation identified between the energy and nutrient intakes in the study population, ICCs were calculated before and after adjustment for energy.After adjustment, all coefficients decreased in relation to the crude value.Yaroch et al. 31 observed the same downward trend.Adjustment for energy would increase the correlation coefficients when the variability in nutrient intake is related to energy intake, but would decrease when the nutrient variability depends on systematic errors of overestimation or underestimation 11 .The foods mentioned by adolescents may have low nutritional value (soft drinks, sugar, and fried foods) and high caloric value, thereby favoring a tendency towards exaggerating the diet's nutrient content.Thus, such disproportional results may affect the AFFQ's concordance 4 .
Livingstone et al. 32 , in a validation study in the early 1990s, called attention to bias caused by lack of compliance (resulting from irritation and boredom) in completing food questionnaires.All survey methods ultimately depend on subjects' motivation, compliance, and ability to accurately report their habitual food intake.Although great effort was made in this study to ensure adolescents' compliance and motivation, we cannot be rule out that some may have gotten bored while filling out the questionnaire.
To our knowledge, this is the first food questionnaire developed and validated in Brazil that accesses this age group.The study complements the assessment of this questionnaire.Although good reproducibility does not ensure validity, reliability studies are often conducted, since they provide information on the answers' consistency and stability 10 .The questionnaire's performance differed according to the nutrients, as expected.The worst results were for carbohydrates, total fat, and iron.Carbohydrates and cholesterol also showed the worst classification agreement according to intake tertiles.Even so, the low percentage of individuals classified in opposite tertiles (11.3%) called our attention.It is well known that errors are inherent to any assessment of dietary status.Since epidemiological studies rely on food frequency questionnaires because of the rapid application and low cost, evidence of likely misclassification of individuals based on these dietary measures is essential for our understanding of diet-related disease risk.As an age group, adolescents have characteristics that need to be taken into account when applying such a research instrument.They need to be highly motivated and have enough time to fill out the questionnaire.In order to enhance their ability to adequately record habitual diet, recall strategies and recall cues can be applied 33,34 .
In conclusion, a food frequency questionnaire for adolescents that was developed and previously validated achieved reasonable reproducibility and can be used for studies that aim to classify groups into low, medium, and high consumers.

Figure 1
Figure 1 (continue) Contributors D. M. L. Marchioni was responsible for the statistical analysis, interpretation of the results, and drafting of the manuscript.S. M. Voci, R. M. Fisberg, and F. E. L. Lima collaborated in the interpretation and discussion of the results.B. Slater was responsible for the research design, implementation, and consultancy.

Table 1
Means and standard deviations of daily nutrient intakes estimated by two Adolescent Food Frequency Questionnaires (AFFQ1 and AFFQ2) in adolescents.

Table 2 Intraclass
correlation coeffi cients (ICC) and confi dence intervals (95%CI) for crude and adjusted energy and nutrient intake estimated by two Adolescent Food Frequency Questionnaires (AFFQ) in adolescents.

Table 3
Percentage classifi cation of adolescents in tertiles, weighted kappa, and crude concordance (p) for adjusted energy intake.