Assessing Usual Dietary Intake in Complex Sample Design Surveys: the National Dietary Survey

Avaliação do consumo alimentar usual em pesquisas com amostras complexas: Inquérito Nacional de Alimentação ABSTRACT The National Cancer Institute (NCI) method allows the distributions of usual intake of nutrients and foods to be estimated. This method can be used in complex surveys. However, the user must perform additional calculations, such as balanced repeated replication (BRR), in order to obtain standard errors and confi dence intervals for the percentiles and mean from the distribution of usual intake. The objective is to highlight adaptations of the NCI method using data from the National Dietary Survey. The application of the NCI method was exemplifi ed analyzing the total energy (kcal) and fruit (g) intake, comparing estimations of mean and standard deviation that were based on the complex design of the Brazilian survey with those assuming simple random sample. Although means point estimates were similar, estimates of standard error using the complex design increased by up to 60% compared to simple random sample. Thus, for valid estimates of food and energy intake for the population, all of the sampling characteristics of the surveys should be taken into account because when these characteristics are neglected, statistical analysis may produce underestimated standard errors that would compromise the results and the conclusions of the survey. 172S Usual dietary intake in complex sample Barbosa FS et al For most epidemiological diet analyses, usual intake is required, and in many situations, such as in large surveys, only one or two 24-hour recalls (24HR) or food records are collected. These methods yield an excessive amount of within-person variation, 4 and early attempts to compensate for this limitation by averaging over a small number of days 1 do not adequately represent usual individual intakes. Thus, more sophisticated methods based on statistical modelling were developed, 2 paying special attention to the problems that are inherent in modelling usual intake of foods or food groups that are episodically consumed. Challenges for the statistical modelling of usual intake include the following: the ratio of within-person and between-person variation; the reported days that are without consumption or the consumption-day amounts that are positively skewed, with extreme values in the upper tail; the correlation between the probability of consumption and the consumption-day amount; and covariate information on usual intake. The National Cancer Institute (NCI) method was designed to meet all of these challenges by allowing effi cient estimation of the usual intake distributions of daily and …

For most epidemiological diet analyses, usual intake is required, and in many situations, such as in large surveys, only one or two 24-hour recalls (24HR) or food records are collected.These methods yield an excessive amount of within-person variation, 4 and early attempts to compensate for this limitation by averaging over a small number of days 1 do not adequately represent usual individual intakes.Thus, more sophisticated methods based on statistical modelling were developed, 2 paying special attention to the problems that are inherent in modelling usual intake of foods or food groups that are episodically consumed.Challenges for the statistical modelling of usual intake include the following: the ratio of within-person and between-person variation; the reported days that are without consumption or the consumption-day amounts that are positively skewed, with extreme values in the upper tail; the correlation between the probability of consumption and the consumption-day amount; and covariate information on usual intake.
The National Cancer Institute (NCI) method was designed to meet all of these challenges by allowing effi cient estimation of the usual intake distributions of daily and episodically consumed items. 13The method also allows the prediction of individual intakes to be used in a model to assess the relationship between diet and disease or another variable 6 and by performing an assessment of the effects of individual covariates on consumption. 12An extension of the NCI method has also been used to estimate the population distributions of the ratios of usual intakes of dietary components. 5

INTRODUCTION
The objective of this article is to indicate the necessary adaptations of the NCI method when estimating the distributions of usual intakes of nutrients and foods.

THE NATIONAL CANCER INSTITUTE METHOD
The premise of the NCI method is that usual intake is equal to the probability of consumption on a given day times the average amount that is consumed in one day.For the dietary components that are consumed nearly every day, the probability is one: if less than 5% of the population had zero intake of a food, this amount-only model can be used and is referred to as a one-part model.For a two-part model, the fi rst part estimates the probability of consumption using logistic regression with a person-specifi c random effect.The second part specifi es the consumption-day amount using linear regression on a transformed scale, with a person-specifi c effect.The two parts are linked by allowing the two person-specifi c effects to be correlated and by including common covariates in both parts of the model.Covariates may be included, particularly if there is interest in subpopulations. 13e NCI method requires a minimum of two non-consecutive 24HR or records for at least a representative sample of individuals from the population of interest, i.e., it is intended for use on large datasets with sample sizes of at least 1,000 or more, especially if distributions in population subgroups are to be estimated. 5,13acros developed by the NCI a in Statistical Analysis System (SAS) program in short, for a single dietary component, two macros are available: The fi rst macro, MIXTRAN, transforms the data and fi ts the model.The second macro, DISTRIB, uses the parameters that are estimated by MIXTRAN to estimate the usual intake statistics through simulation.DISTRIB can also provide the estimated percentage of the population whose usual intake falls below a given value. 13e standard errors and p-values output of the Mixtran macro are only valid for an analysis of a simple random sample.Special care must be taken when using the NCI SAS macros to analyze data from a complex survey, because in analysis of complex surveys, calculation of these standard errors requires additional programming to implement a replication method such as Balanced Repeated Replication (BRR).

Standard Errors in complex samples
Complex samples (CS) differ from simple random samples (SRS) in that SRS designs assume independence of observations, while CS does not.In complex sample design surveys, the standard error estimates are generally small and biased if the differential weighting of individuals and the complexity of a sample design (i.e., the identification of stratum and primary sample units (PSU) are ignored. 7The replication methods represent one approach to handling this problem, with standard errors computed from each subsample, and the variability among these subsamples (or replicate estimates) is used to compute the standard errors of the full estimate. 14ere are various replication methods for creating subsamples.In this study, a variation of the Balanced Repeated Replication (BRR) method known as Fay's BRR method was used. 3,8The BRR is a variance estimation method for two PSU/stratum designs.BRR makes a half-sample replicate by selecting one unit from each pair of PSUs and weighting the selected unit by 2 so that it represents both units.Consequently, estimates from every PSU are in each replicate, although half of them are weighted to zero.In Fay's BRR method, observations in the sample PSUs that are not chosen for replication are not zeroed out, in contrast to BRR.Instead, their sampling weight is diminished by a multiplicative factor K (K is a proportion), whereas the observations in the sample PSU that are chosen for the replication have their sampling weights enhanced by the multiplicative factor (2 -K).Setting K = 0 yields the standard BRR technique.A commonly recommended value is K = 0.3 for Fay's method.For example, when K = 0.3, the weights are reduced to 30% of their original values in one half sample and are increased to 170% of the original value in the other half sample.The Fay's BRR method was developed for the specifi c situation in which there are two PSUs per stratum design; however, the Brazilian Dietary survey has more than two PSUs per stratum.To overcome this restriction, the usual way is to randomly group the PSU in each stratum into two groups and then apply the BRR procedure.This is the so-called grouped balanced half-sample (GBHS) method.
To exemplify the GBHS method, consider stratifi ed random sampling without replacement from a fi nite population of N units divided into H strata, with Nh units in stratum .Let nh be the sample size in stratum h .In the GBHS method, the sample in each stratum is fi rst divided at random into two groups containing and units.A set of R half-samples balanced on the groups is formed as follows: Let if group 1 in stratum h is in the r th half-sample, otherwise, and , r = 1, ..., R.More details on this method may be found in Kish & Frankel b (1968) and Wolter 14 (1985).An SAS macro was utilized to perform the random grouping of PSUs within stratum.
We applied all of these procedures in the following example using the 2008-2009 Brazilian Dietary Survey, which has a complex sample design.This survey was performed as a part of the 2008-2009 Household Budget Survey (HBS), which was conducted by the Brazilian Offi ce of Statistics and Geography (IBGE -Instituto Brasileiro de Geografi a e Estatística) as a two-stage sampling process.In the fi rst stage, PSUs were selected according to the number of households that were in a unit, and in the second stage, the households were selected by simple random sampling.The Individual Dietary Intake was conducted in 24% of the households that were selected in the 2008-2009 HBS.Two non-consecutive days of food records were collected among 34,003 individuals.For this analysis, 1,254 women who were pregnant or lactating were excluded.This resulted in a fi nal sample size of 32,749 individuals.Number of individuals in each days of food record for energy and total fruit by sex and age group is shown in Table The application of the NCI method varies for dietary components that are consumed daily and those that are episodically consumed.We exemplifi ed both of the situations, analyzing the total energy (kcal) and fruits (g).The means, standard errors for means, and intakes at the 10 th , 50 th , and 90 th percentiles and standard errors for gender/age groups were estimated for the two examples, comparing estimations that were based on simple random sampling (considering only weights) and complex sampling (considering the weights and the complexity of sample design).For complex sampling, standard errors were estimated using Fay's BRR method, and twelve sets of BRR replicate weights were generated with a factor 0.3.As an additional step, the sampling weights were rounded before being fed into macros, because the MIXTRAN macro can only work with integer sampling weights.The weights were post-stratifi ed to control totals in each replicate.
The models were fi tted to the data using the statistical software package SAS (Version 9.2).
More than 50% of the individuals in each sex-age group reported no consumption of fruits indicating how episodic fruit consumption is.For energy, more than 96% in each sex-age group provide two days (Table 1).As expected, point estimates were similar when complex sampling and simple random sampling were compared, but there was an important difference in the standard errors, which vary according to the subpopulations.The percentiles of usual intakes were also quite similar because the NCI method corrects for within-person variability (Table 2).

DISCUSSION
The NCI method is one way of estimating usual intake.
Recently, four methods (Iowa State University Method -ISU, National Cancer Institute -NCI, Multiple Source Method -MSM, and Statistical Program for Age-adjusted Dietary Assessment -SPACE) were compared, and all of them provided similar estimates of usual food intake.Nevertheless, when a nutrient has a high within-person variation or has a highly skewed distribution, and when the sample size is small, the estimates could be biased. 10e limitation of the NCI and other methods is the assumption that the 24HR records represent an unbiased instrument for measuring food intake.Studies with doubly labeled water have found misreporting of the energy intake on the 24HR record, almost always in the direction of underreporting. 11This observation suggests that at least some foods are underreported as well. 10,11eater understanding of the effect of each characteristic of the sampling on the results should stimulate researchers to use adequate methods for data analysis.
The Brazilian Dietary Survey is stratifi ed to account for variation in intakes over the entire year, and stratifi cation has an important effect on weighting but also in the degrees of freedom of the analysis.Clustering at the PSU level is the most important factor, increasing by more than twice the SE.On the other hand, the NCI process of accounting for intra-individual variability in food intake and weighting corrected almost all of the skewedness in the subpopulation that was analyzed.
Valid estimation of food and energy intake for the population must account for all of the sampling characteristics of the surveys because when these characteristics are neglected, statistical analysis might produce underestimated standard errors that would compromise the results and the survey's conclusions.

Table 1 .
Number of individuals in each days of food record for energy and total fruit by sex and age group.Brazilian Dietary survey 2008-2009.a 1. a Usual Dietary Intakes: SAS Macros for Analysis of a Single Dietary Component.[cited 2013 Mar 6].Available from: http://riskfactor.cancer.gov/diet/usualintakes/macros.htmlb Kish L, Frankel MR.Balanced repeated replication for analytical statistics.In: Proceedings of the Social Statistics Section, 2-11.1968.New York, United States.New York: American Statistical Association; 1968.a Women who were pregnant and lactating were excluded.

Table 2 .
Mean (SE) and percentiles (SE) of usual intake according to the sample design assumed for the survey.Brazilian Dietary survey 2008-2009.a SE: Standard error a Women who were pregnant and lactating were excluded