Multivariate estimate of eating patterns: is the whole different from the parts?

REV BRAS EPIDEMIOL 2020; 23: E200063 ABSTRACT: Objective: To describe the correlations between eating patterns for the years 2007 to 2012, and for each year of the period from 2007 to 2012. Method: Cross-sectional study with data from the System of Surveillance of Risk and Protection Factors to Chronic Diseases by Telephone Survey with the selection of 167,761 individuals aged 18 to 44 years old. Eating patterns were identified with a Principal Component Analysis. To compare the effects of the extraction and the estimate of eating patterns among different surveys we conducted the following analyzes: in the first, we used the total data set for the years from 2007 to 2012; in the second, the patterns were estimated in each annual set of data for the period from 2007 to 2012. Steps 1 and 2 were performed with no rotation, with Varimax rotation and with Promax rotation. After extracting the patterns, standardized scores with zero mean were generated for each pattern. The association between the patterns generated in the analyzes was estimated by the Pearson correlation coefficient (r). Results: In the non-rotated analyzes, the components retained in the set presented correlations that were higher than 0.90, with the retained patterns in each year. In the rotated analyzes, only the first component had correlations that were higher than 0.90. Conclusion: Estimates of eating patterns either segmented — year by year — or in general — all of the years — showed high correlation and consistency between the patterns identified when in the same data pool.


INTRODUCTION
An analysis of dietary patterns is preferable to describing diets by type of food or nutrient, because food consumption is determined by multiple factors and food choices and their nutrients does not occur randomly 1 . Considering that food consumption is not random and there is a correlation between food and nutrients, the study of diets using patterns has been widespread 1 .
In general, analyzes that compare dietary patterns, estimated through multivariate analysis, between two or more surveys, are conducted in each period separately, which makes it difficult to compare dietary patterns. This is because the composition and the order of importance in the explanation of the variability are modified according to how the data set is treated. Alternatively, it is possible to estimate the patterns in the total set of surveys and then calculate the scores for each pattern according to the periods, or other strata in the data set.
The aim of this study was to describe the correlations between dietary patterns for the set of years from 2007 to 2012 and for each year in the same period. Após a extração dos padrões, foram calculados escores padronizados com média zero. A associação entre os padrões gerados nas análises foi estimada pelo coeficiente de correlação de Pearson (r). Resultados: Nas análises sem rotação, os componentes retidos no conjunto apresentaram correlações superiores a 0,90 com os padrões retidos em cada ano. Nas análises com rotação, apenas o primeiro componente apresentou correlações superiores a 0,90. Conclusão: As estimativas de padrões alimentares de forma segmentada -ano a ano -ou de forma geral -todos os anos -apresentam altas correlação e consistência entre os padrões identificados quando no mesmo pool de dados. In this study, 167,761 individuals aged 18 to 44 years old were selected. The food consumption variables selected were: weekly frequency of consumption of beans, vegetables, raw vegetables, cooked vegetables, red meat, chicken, fruits, soft drinks or artificial juice, milk, daily vegetable consumption and consumption of visible fat.

Palavras
Dietary patterns were identified with the Principal Component Analysis (PCA). PCA is a factor analysis that reduces data into patterns based on the correlations between the variables 3 . The first main component corresponds to the direction of greatest variance, and the other components are orthogonal to the previous components 4 . Rotations are used in order to improve the interpretation of the extracted components. Varimax rotation of the orthogonal matrix maximizes the variation between the factorial loads, and the components remain not correlated. Promax oblique matrix rotation rotates the axes so that the vertices can have angles other than 90 degrees. In this type of rotation, the probability of some association between the components cannot be ruled out 5 .
To compare the effects of extraction and the estimation of dietary patterns between different surveys, we conducted the following analyzes: • in the first, we used Vigitel's total data set for the years 2007 to 2012; • in the second, the patterns were estimated in each Vigitel annual data set for the period from 2007 to 2012.
Steps 1 and 2 described above were performed with no rotation, with Varimax rotation and with Promax rotation. In the analysis, the components with eigenvalues> 1.0 were retained, according to the Kaiser rule 5 . We considered the number of patterns retained in the first stage. After extracting the patterns, standardized scores were calculated with an average of zero for each one, so that each individual received a standardized value that represented their adherence to each of the patterns analyzed. The patterns were named according to their order of retention, that is, the first pattern was named CP1, the second CP2, and so on. The association between the patterns generated in the analyzes described above was estimated by Pearson's correlation coefficient (r). The analyzes were conducted using the Stata program (Stata Corporation, College Station, United States).
Vigitel was approved by the National Human Research Ethics Commission of the Ministry of Health 2 . For Vigitel, free and informed consent was obtained orally at the time of telephone contact with the interviewees. The present study was assessed and approved by the Research Ethics Committee of the School of Public Health of the Universidade de São Paulo under Report number 1,885,826 of January 5, 2017.
This article comes from the master's dissertation of the author, Iolanda Karla Santana dos Santos. It is entitled Patterns of food consumption and physical activity based on data from VIGITEL and was presented to the Graduate Program in Nutrition in Public Health of the School of Public Health from the Universidade de São Paulo. Table 1 shows the correlations between the patterns retained in the 2007 to 2012 set and for each year of the same period, with no rotation, and with the Varimax and Promax rotations. In the analyzes with no rotations, the components retained in the 2007 to 2012 set showed correlations greater than 0.90 with the retained patterns in each year, separately. In the rotational analyzes, only the first component showed correlations greater than 0.90 in all of the years.

DISCUSSION
Our results indicate that: • PCA analysis can be used in time series data sets with the same sample structure; • depending on the purpose of the study, it is not advisable to use Varimax or Promax rotation after retaining the components. In this study, with six years of monitoring and pattern retention with eigenvalues > 1.0, the correlations between the retained patterns in the set and for each year with no rotation were greater than 0.90, showing high internal consistency. Regarding the patterns that did not remain in the comparative analyzes, some pairs showed correlations below 0.90.
In an expanded analysis (data not shown can be requested from the authors), in which we included all of the years of monitoring, the correlations between some of the patterns extracted from the set and equivalent patterns extracted from the databases, separated by year, were less than 0.90. This lower association is exactly the reflection of changes in the consumption of diet components, distributed among the population and relevant to the interpretation of changes in dietary patterns. In this case, without analyzing the databases together, it would be impossible to interpret the changes in dietary patterns that occurred in the period.