Use of Linear Regression for Correction of Dietary Data

OBJECTIVE: Methodological approaches have been developed to minimize effects of measurement error in dietary intake data. The objective of the study was to apply a strategy to correct intake data according to measurement error. METHODS: Intake data were obtained by applying the Food Frequency Questionnaire in 79 adolescents of São Paulo city, Brazil. Correction of dietary intake data was performed by linear regression, after energy adjustment by the residual method. The reference method used was 24-hour dietary recall and it was applied three times. CONCLUSIONS: Mean and standard deviation of corrected values show a correction for the measurement error. The performance of these methods, that are imperfect, is questioned when the assumptions are not proved which is common in intake studies of measures based on the reports of individuals.


INTRODUCTION
The greatest limitation of epidemiological studies assessing association between diet and diseases has been the diffi culty to measure precisely and accurately regular dietary intake of individuals.This limitation made the number of studies to assess the performance of instruments to collect dietary information increase, especially the Food Frequency Questionnaire (FFQ). 3though it has become the most frequently used tool in studies about nutritional epidemiology, FFQ also presents limitations regarding memory and perception 8 lack of standardization of the tool and training of the interviewers, problems in the structure of the tool, and also biases. 12,18er a decade ago, Beaton 3 stated that "dietary intake cannot be estimated without error and probably never will be".Considering this statement, error can be understood as a statistical concept, and not as an error in data collection.Indeed it is acknowledge that measuring individuals' usual food intake is subjected to systematic and random errors.Systematic error or bias occurs on mean for all individuals measured.Random error occurs, especially, due to day to day fl uctuations and varies among individuals, with zero mean, providing less accurate measures. 2rrelations between diet estimates using FFQ and other reference methods are usually between 0.3 and 0.7, suggesting a signifi cant error. 7Literature in this area shows that measures of association observed in food intake demonstrated in epidemiological studies are relatively low, generally below 2.0.Thus, correlations may not be observed due to lack of accuracy and precision of the diet measure. 18us, methodological strategies have been developed in an attempt to estimate parameters correctly, such as in validation studies and, more recently, in calibration studies.
Calibration can be defi ned as the process of determining the correlation between two measure scales.This statistical methodology aims at making measures obtained at FFQ closer to real intake values, estimated by reference methods supposedly free from biases (systematic errors), and it is not necessarily applied more than once. 13hus, corrected values are obtained, partially free from the errors present in FFQ.
The main objective of calibration studies is to use the information obtained to adjust the measures of association that will be estimated in the main epidemiological study, by error correction associated with the use of FFQ. 4,14,18This approach is performed a priori, from the parametric or non-parametric regression model. 10,13,14ong the parametric methods, calibration using linear regression model 14 is considered as the standard: 17 and intake estimate obtained by the reference method is modeled as a function of the FFQ intake.Linear regression model obtained in the calibration study may be used as a predictor model to estimate real intake, from a FFQ value.In this approach, both systematic and random errors are incorporated in the intake measure obtained at FFQ 14 When used afterwards in the analysis of correlation between disease and food intake in the main study, calibrated values may eliminate or reduce signifi cantly, errors that could affect risk estimates. 10,17us, the objective of the present study was to describe the application of a strategy to calibrate diet information by measure error.

METHODS
Database of the study of FFQ validation carried out in 1999 by Slater et al, 16 with 79 adolescents studying in a public school in São Paulo whose ages ranged from 14 to 18 years old was used.Details on the methodology are given elsewhere. 16r this study, we used data on energy and macronutrients intake of all individuals who completed at least three 24-hour dietary recalls (24-hour recall) and one Food Frequency Questionnaire for Adolescents (FFQA).
Food intake recorded by 24-hour recall and FFQA was transformed into energy and nutrients, using Virtual Nutri software, altered regarding nutritional values of the food and the inclusion of tested recipes. 6ta on the energy intake of individuals which were between 500Kcal and 6000Kcal were included in the study, according to what was recommended by Andrade et al. 1 Initially, mean-standard deviation was calculated for the total of energy and macronutrients of the diet according to 24-hour recall and FFQ.Normality distribution was tested by applying Kolmogorov-Smirnov's test.Then, estimates were adjusted according to energy using residual method, 18 aiming at estimating the fraction of these nutrients that was not correlated with total energy intake.
For the correction of a nutrient data, we must consider that x i is the true usual intake of i individual, and that z ij is the intake estimated from the j th 24-hour recall (j= 1, 2, 3).Considering that is the average of the three 24hour recall of nutrient values.To obtain the calibrated values of FFQA, the average of 24-hour recall values ( ) is regressed in values of FFQA, using the classic error model as a basis: X is the true value, Z intake obtained by 24-hour recall, and E(ε z ) = 0. Therefore, E (X) = E(Z).
Assuming a linear correlation between the values obtained in FFQA and 24-hour recall values, we have used the method of linear calibration regression recommended by Rosner et al 14 to predict true energy and nutrient intake value x i, as of nutrient value from FFQA, Q i .
Estimates of α and λ were obtained from the regression of z i in Q i .
In this situation, slope of the estimated regression line represented by λ, is the key information for correcting the error of the correlation between fi nite and the estimate of nutrient intake in the context of an epidemiologic study.
Supposing X, Q and ε Q normality, which is the linear model defi ned in the equation (1), variance of the predicted values, Var(X), is estimated as the variance of calibrated measures from the questionnaire, according to Kaaks et al. 10

RESULTS
Of the 79 adolescents from the sample, 39 were male and mean age observed was 15 years old.
Correlation between dietary data obtained by the two instruments is showed in the Figure represented by scatter plot charts for energy and macronutrients.
Results of the application of calibration method for data correction on energy and macronutrient intake are presented in Table 1.The two fi rst lines for energy show the values of descriptive statistics of food intake, measured by FFQA and 24-hour recall, followed by the calibrated value.For the remaining nutrients, crude values and those adjusted by energy are presented followed by the calibrated values.
It is possible to observe that 24-hour recall values, FFQA values and calibrated values were close to energy, with small overestimation of the FFQA.Also for lipids, the values obtained were close between the methods, with values slightly more increased for estimate obtained by 24-hour recall.For proteins and carbohydrates slightly more increased values were recorded at the 24-hour recall and FFQA respectively.After calibration, average of QFAA became similar to that from 24-hour recall adjusted by energy, as desired (non-signifi cant differences in the paired t test).However, this consistency is obtained at variation expenses.Variation in food intake of calibrated values decreases, as can be verifi ed by the standard-deviation of the calibrated measure compared to the original values.
Estimated values for α and λ variables of the linear regression model as well as standard-error, and the Pearson's correlation coefficient, are presented in Table 2.
An increased value means that a specifi c nutrient intake may be measured by FFQA in a comparative fashion with 24-hour recall.In calibration using linear regression, 14 the method used in the present study, we want the intercept to be approximately zero and the slope, represented by λ, to be approximately 1.These characteristics indicate absence of biases in the questionnaire, that is, mean intake estimated by the questionnaire would be equal to the mean intake estimated by the reference method.Actually, the slope in the regression line is lower than 1.
The regression coeffi cient was closer to that desired for energy (λ=0.89),indicating an excellent performance of the instrument.Similar studies in the literature report lower values, ranging from 0.09 to 0.45 according to gender and etnia. 9,17For carbohydrate, the coeffi cient obtained was reasonable, comparable to the study of Stram et al, 17 in Hawaii, which ranged from 0.41 to 0.54 in men, and 0.20 to 0.73 in women.
On the other hand, for lipids and proteins, a consider- The factor of calibration λ, that should be 1 ideally, was 0.89 for energy.For macronutrients, factors were 0.41, 0.22, and 0.20 for carbohydrates, lipids, and proteins respectively.

DISCUSSION
The present study is the fi rst published in Brazil presenting one of the strategies used to correct dietary data by error measure obtained at FFQ.It is important to highlight that studies approaching the calibration methodology are rare in the literature, and, despite the clear defi nition distinguishing the concepts of validation and calibration, it is still common to fi nd the incorrect interpretation or denomination of the procedures and statistical analysis in published articles, especially regarding calibration.In the present study, variables were adjusted according to energy using the residual method, widely used in the literature, with great consistency between the adjusted mean and the original mean.Adjustment according to energy is motivated both by the need to consider isocaloric models, and to control the measure error which is part of the methods. 5,11,18Pearson's correlation coeffi cients, after calibration, were high for energy and low for macronutrients, demonstrating consistency with estimates of calibration factor.
A premise of the method is the normality of distribution of the variables in the model. 7In the present study, variables presented normal distribution and cut-off points for excluding non-plausible values were established.
Attenuation observed in the slope value may be partially explained by the bias in the intake report, of the estimate error obtained by the reference method.This effect may be present because of the time and difference in the dietary data for nutrient intake among the methods used. 4,15other explanation regarding the attenuation of λ coeffi cient is the violation of theoretical assumptions of the calibration method, such as independence between the errors of both assessment methods of dietary intake, absence of systematic errors in the reference method, and the independence between errors and real intake. 18e of the results observed was the shortening of standard-deviation for calibrated data.This behavior has also been described in other studies. 12,15Calibrated data present less variation than original ones due to the correction of classifi cation error of the individuals.Extreme values are especially affected by linear correction due to the assumption of linearity between the reference method and the questionnaire. 13aser and Stram 7 , demonstrated the present bias when crude FFQ data are used in a multivariate regression to estimate the effect of diet in disease and it is essentially eliminated by a calibration regression.Thus, the use of FFQA in epidemiological studies without correction, may lead to the above described situation.The authors call attention to the wide literature of studies on the diet-disease correlation which, in general, do not make use of techniques of error correction, which may explain the confl icting or inconsistent results.
In the present study, we could observe that the linear calibration approach makes corrected values more similar to the mean of reference values, which points out to a reduction of the measure error.However, applying the linear regression method presented raises issues that should be further discussed.Among the issues that are worth mentioning, is the performance of this approach when the theoretical assumptions are not met, which is common in dietary studies that use measurement methods based on individual's reports as a reference.
Thus, it is possible to highlight the need for studying new methodologies to correct measure error or even searching for new alternatives that make FFQA a less imperfect instrument.

Figure .
Figure.Scatterplot charts between 24-hour recall values and Food Frequency Questionnaires for Adolescents for energy and macronutrients adjusted by energy.City of São Paulo, Southeastern Brazil, 1999.(N=79)

Table 1 .
Descriptive statistics of energy and macronutrient intake obtained at FFQA 24-hour Record of 79 individuals, before and after adjustment through energy and correction.City of São Paulo, Southeastern Brazil, 1999.

Table 2 .
Estimates of α and λ parameters, standard-error and 10le bias was observed (respectively λ =0.22 and λ =0.20), demonstrating sizeable correction for approximation of the reference value.Values of λ obtained by Kaaks et al10for protein intake in the pilot phase of the