The identification of food patterns: a comparison of principal component and principal axis factoring techniques

: Introduction: Dietary patterns are based on the concept that food consumed together or by itself is as important as food or nutrient intake. Objectives: To identify dietary patterns in a sample of nursing professionals and to explore the differences between the patterns found using two techniques: principal components (PC) and principal factorial axis (PAF). Method: The current report was based on data from 309 participants on a nursing team at a public hospital in Rio de Janeiro. A 24-hour dietary recall was used, resulting in 24 food groups. To identify the dietary patterns, we applied a multivariate analysis, specifically the PC and the PAF, followed by a Varimax orthogonal rotation. Results: The Cattell graphic test indicated three factors to be extracted. The communality varied between 0.41 and 0.76. Higher loads than 0.30 were considered in the pattern composition. The two methods identified similar dietary patterns, called traditional patterns . The other two patterns were nominated as healthy and snacks, having inverted position of factors in the applied techniques. Conclusion: The observed differences refer to: the number of food groups that enter the composition of components and factors; the size of the smaller loads in the PAF and the order of the alimentary patterns, especially those derived from loads of a smaller magnitude. However, these differences do not seem to impact the interpretability of dietary patterns in this population.


INTRODUCTION
Research in nutritional epidemiology has long been interested in estimating habitual food intake in order to assess whether food consumption is in compliance with dietary recommendations and in order to relate it to health parameters. However, since 1969, interest has arisen in investigating the effect of foods by thinking about how they are actually consumed in various combinations, called dietary patterns 1 .
Dietary patterns are conceptually based on the fact that food consumed together is as important as the consumption of food or nutrients alone. In addition, it has been an alternative approach to study the relationship between nutrition and disease 2 .
In the identification of dietary patterns, statistical methods of multivariate analysis have been used, among them principal components (PC) and exploratory factorial analyses. An important step in the application of these methods is the factor extraction technique. Although several methods are available, "factor extraction using principal components" is the most common method applied in nutritional epidemiology 3 .
In general, food consumption data do not have a multivariate normal distribution and are subject to substantial errors regarding the habitual consumption estimates because individuals have difficulty in reporting their food intake accurately 4 . Thus, in the exploratory factorial analysis, the technique of principal axis factoring (PAF) 5,6 could be applied in the extraction of factors, since, like PC, it does not assume a probability distribution that is associated with the food consumption variable. In addition, this technique allows for the RESUMO: Introdução: O padrão alimentar tem como base conceitual o fato de que os alimentos consumidos em conjunto são tão importantes quanto o consumo de alimentos ou nutrientes isoladamente. Objetivos: Identificar padrões alimentares em amostra de profissionais de enfermagem e explorar a diferença entre os padrões encontrados, utilizando para isso duas técnicas: componentes principais (CP) e principais eixos fatoriais (PAF exclusion of the portion of food group (FG) variability that is considered to be a measurement error, from the food standard identification process.
The present article aims to identify the dietary patterns in a sample of nursing professionals and to explore the difference between the patterns found using two techniques, PC and PAF.

DESIGN AND STUDY POPULATION
The data used in this study were obtained in the second phase of the longitudinal study "Night work and risk factors for cardiovascular diseases: in nursing teams", from February to July 2013, which was coordinated by the Laboratory of Education in Environment and Health of the Oswaldo Cruz Foundation (FIOCRUZ). It was approved by the Fiocruz Ethics Committee at the Oswaldo Cruz Institute (IOC) nº 635/11, of 2012.
Considering the sample size of this study, with 309 participants and 24 FG, the ratio (309/24) is approximately equal to 12, therefore, it is higher than the appropriate limit (10) considered for the application of the factorial method 5 .

GENERAL AND ANTHROPOMETRIC EVALUATION
Participants were invited to respond to a multidimensional questionnaire using a faceto-face interview technique. Height was measured with the Alturexata stadiometer (Belo Horizonte, MG, Brazil), and body weight was measured with a Tanita scale (Tokyo, Japan).

FOOD CONSUMPTION ASSESSMENT
The 24-hour food recall (R24h) was answered on two non-consecutive days and with a seven-day interval, by all nursing professionals, with the support of a photo album of utensils and food portions 7 .
Culinary preparations, beverages and processed foods consumed in the period preceding the interview were recorded in household measurements, and the brands of the product manufacturers were requested. To digitize the information, food and beverages were converted to grams (g) and milliliters (mL), and were analyzed according to their nutritional composition and classified according to the groups described in the Table of Measures for Food Consumption in Brazil (TMRAC) 8 . The composition of the processed foods was analyzed according to the nutritional label available on the websites.
The R24h was entered once in excel spreadsheets. In order to perform quality control on the data, a comparison of the entire original food recall was made, independently, by two nutritionists with the spreadsheet records.

STRATEGIES FOR GROUPING FOODS IN ORDER TO MAKE FOOD GROUPS
The formation of FG for standard analysis is not a trivial task. Considering that the food combinations result in the multivariate analysis variables, this is an important step in identifying patterns. This phase requires several attempts to combine the reported foods. The first basic rule in the formation of FG is to look at the similarity of the nutritional composition of the foods 9, 10 .
In addition to combining foods to form FG, another approach is to exclude food that does not fit the criterion of nutritional similarity or low consumption frequency in the study population 11,12 . When grouping foods, the literature shows studies that vary from 20 10 to 67 13 FG, but the studies' authors do not always make clear the method used for forming the groups.
The method of grouping foods used in this study was performed respecting nutritional similarity and the groups indicated in the TMRAC, resulting in 24 FG, out of a list of 459 foods.
Before initiating the multivariate analysis, FG were subjected to the reduction of intrapersonal variability of information using Multiple Source Method (MSM) software, which aims to estimate the distribution of usual consumption of the population in question 14 .

Initial assessment of the data's suitability for the technique
The Kaiser-Meyer-Olkin index (KMO) was applied to evaluate the adequacy of the data in relation to the factorial analysis. Acceptable values range from 0.50 to 1.00, but it is not a limiting parameter for the continuity of the analysis 14 .
The Bartlett sphericity test evaluates the correlations between the variables and is very sensitive to the size of the sample. In this test, we expect to reject the null hypothesis that the correlation matrix approaches an identity matrix 15 .

Number of components / factors to be extracted
The Cattell graph, Scree Plot, was used to identify the number of factors to be extracted. In this graph the relation of the eigenvalues (ordinate) with the number of components (abscissa) is observed; the first point corresponds to the one with the highest eigenvalue, which is the first component to be withheld. The cutoff point shown in the curve, known as an elbow, indicates that there is no relevant gain in terms of variance, which is explained by adding more components in the analysis 16,17 . Ruscio and Roche 18 argue that there is not always an elbow detected in the visual inspection, in case a subjective judgment in the identification of the cut-off point is required.

Extraction of Factors/Components
The PC technique is widely used in food patterns analysis when compared to PAF 19 , however differences between these techniques are rarely discussed in the literature. The PC analysis is based on the total FG variance and does not distinguish between specific variance and common variance. On the other hand, in PAF, the specific part of FG variability is not included in the derivation of the factorial structure 17 .
The specific part refers to the variance portion of the FG that is not shared with any other one, which is unique to the FG in question. The common variance indicates the amount of FG variability that is shared with the others, known as commonality.
Since the PAF technique aims to reveal latent constructs (variables that cannot be directly observed) that explain the covariance between items, the specific variances (individual parcels of items) that would not covariate with each other, are not considered in the model 19 .
Another aspect concerns loads (the correlation of the variable with the component) and commonalities resulting from the application of the two methods. In the PC, the loads and commonalities tend to be higher in relation to those obtained from the PAF. Consequently, the same occurs with regard to the percentage of explained variance of FG by the components. On the other hand, these differences are less pronounced in situations where the share of the common variance is high (compared to the specific variance) for most variables. According to Hair et al. 15 , in most cases, both PC and PAF attain the same results if the number of variables exceeds 30 and if the commonalities exceed 0.60, for most variables.
Costello and Osborne 5 recommend being parsimonious in choosing the factorial analysis technique, since there are several methods available that have their own objectives and assumptions, and the information on the strengths and weaknesses of these techniques are not always presented clearly. Although the PC and the PAF are data reduction techniques, they reduce data in different ways with regard to the treatment given to the variance plots of the variables contemplated in the analysis. In addition, the PC is not classified as a factorial analysis technique, as it is more appropriate for dimensionality reduction purposes, whereas factorial analysis is more adequate when the objective is to identify latent structures or constructs 5,20 .
In this study, dietary patterns were derived using the PC and PAF that were appropriate for data without a normal multivariate distribution. The loads and commonalities of each food group were obtained, as well as the % of explained variance.
In the interpretation of eating patterns, the cut-off point | 0.3 | was adopted. Thus, FG with loads > 0.3 directly contributed to the pattern, while FG with loads < -0.3 correlated negatively with the food pattern. In both methods, Varimax orthogonal rotation 21 was applied. It was decided not to exclude FG with low loads, nor those with cross loads (above the cut-off point in more than one component/factor), in order to compare FG behavior in both methods.
The Statistical Package for the Social Sciences software (SPSS) version 20.0 was used for determining and extracting the number of components/factors.

RESULTS
Regarding the characteristics of the interviewees, the majority were female (85.8%) and had a mean age of 43.7 (standard deviation -SD = 11.5). Regarding race/skin color, 40.5% of the professionals said they were white/yellow. The majority had completed a university degree (60.9%). The nursing assistant position had the highest concentration of professionals (49.5%). Regarding the net family income, 1.3% were in the lower range (R$ 901 to 1,800) and 17.8% earned between R$ 5,401 and R$ 7,200. The nutritional diagnosis of the nursing professionals indicated 63.8% were overweight.
The KMO test (0.52) and Bartlett (p <0.001) indicated acceptable values for applying the proposed techniques. The scree plot (Graphs 1 and 2) pointed out three components and an equal number of factors.
In the PC technique, the three withdrawn components individually explain 8% of the variance of the food groups and, together, 23.47% of the total variance ( Table 1) components described previously suggests that the first component reflects traditional food patterns; the second component, healthy food patterns; and the third, snack food patterns. Seven FG did not reach the cut-off point, therefore they did not contribute to the interpretation of the food pattern derivatives for PC (legumes, and pea, garbanzo bean, and lentil soups; roots and tubers; cereal, popcorn, corn, granola; eggs; milk and cheese products; natural juices, and, finally, vegetable oils). The commonalities of these FG were also low (< 0.20).
In the PAF technique, the three factors approximately explain 15% of the total variance of FG, 10% less variability in relation to the components. Individually, each factor accounts for 6 to 4.5% of FG variance. Among the 24 food groups included, 14 presented no loads higher than the cutoff point. In addition to the seven FG mentioned in the PC extraction, the following stand out: potatoes; bananas and oranges; sweets and desserts; fish and seafood; white and whole grain breads; cakes and cookies; cold cuts and sausages. In terms of commonalities, these FG had lower values (<0.15).
It should be emphasized that a greater number of FG enters the standards derivation obtained by the PC. Differences between the techniques occur in FG with intermediate loads, values between | 0.3 | and | 0.15 | which have no relevance in the interpretation of patterns. For example, in the healthy pattern, eight groups had loads > | 0.3 | in the PC, while in the PAF, there were five groups. The following FG were excluded: bananas and oranges (0.243), sweets and desserts (-0.187) and cold cuts and sausages (-0.213).
Considering that the interpretation of the food pattern is largely given by the loads with higher FG values, positively or negatively, it is observed that there is no difference in the methods in relation to the naming of the pattern, while adopting this criterion. The two techniques identified similar eating patterns, highlighting the first pattern as the same, and thus termed the traditional one. The other two patterns were named as healthy and snacks, though not in the same order in applied techniques.

DISCUSSION
This study allowed us to compare derived dietary patterns by means of two extraction techniques. It should be noted that the loads derived by the PC are higher in relation to those obtained by the PAF. Consequently, the parameters calculated by the PC will be higher when compared to those obtained from the PAF. The inclusion of the specific variability plot in the derivation of the structure explains these larger values of the loads by the PC.
Among the 24 FG included in the analysis, seven presented no higher loads than the cutoff point in any of the methods (legumes and soups, roots and tubers, cereals, eggs, dairy and milk products, natural juices, and finally, vegetable oils). From a more conservative perspective, it would be advisable to exclude them from the analysis, one at a time, and carry out a new analysis. For our purposes of comparing techniques, we have decided to keep them.
In this study, in each method, the percentages of the variance explained by the components/factors are close. That is, it is observed that no component or factor explained a more relevant portion of total variability than another. In general, the first component, or factor, explains a higher % of the variability of the original variables when compared to the others. Such behavior did not occur in these data, possibly because the correlations between FG are of moderate to low magnitude, which can be reflected in the borderline KMO test.
Arruda et al. 2 analyzed an adult food pattern in the cohort of Ribeirão Preto, São Paulo, and found a total variance explained of 20.92% in the extraction of four factors. In the study by Olinto et al. 22 , with young adults from the city of Pelotas, Rio Grande do Sul, analyzes of dietary patterns explained 15.7% of the total variance. We found studies with an explained variance of 9% 21 , as well as studies with values greater than 39% 23 . We did not find, in the literature of dietary patterns, parameters that point out the desired percentages of variance explained. For Hair et al. 15 , 60% would be an acceptable value for accumulated variance, since the higher the cumulative percentage, the greater the variability of the original data that will be preserved in later analyzes with the derived components.
In general, in our study, the loads derived by the PAF have a lower magnitude, but similar patterns to those obtained by the PC. Groups with a greater load in a certain component maintained this behavior in the PAF. Regarding the direction of the association, the FG maintained the same meaning in the two extraction methods. A higher number of FG presented a load below the cutoff point in the PAF. The inversion between the healthy and snack food patterns versus the extraction techniques is noteworthy. Snacks appear as the third component in the PC, while in the PAF, it is the second factor. These results reflect the fact that the PAF is based on the portion of variance shared with the others, excluding the specific portion. On the other hand, the low correlation between the FG may have contributed to this result by univariably indicating the FG that covariate or go together. However, since it is possible to assume that the pattern of correlations or covariates observed between the FG is due to cultural and behavioral processes combined with the sociodemographic characteristics of the population, which cannot be directly observed, the FAP can be used for the purpose of identifying the structure underlying the data that generate the observed correlation pattern. This part of the variability is explained by external sources, thus it is considered a measurement error in the food pattern identification process.
In the literature review, we did not find food pattern articles among nursing professionals in Brazil, which limited the possible comparisons with our findings. Fernandes et al. 24 , studying working days and health behavior among Brazilian nurses, in a population of 87.3% women, reported that, compared to female nurses, male nurses had less healthy health behaviors, such as: a higher consumption of alcoholic beverages, coffee and fried foods, and a lower consumption of fruits and vegetables. In this context, where the majority of the participants in our study were also female, it is possible to assume that the first traditional food pattern in both techniques seems to indicate that nursing professionals adopt the same food pattern as the Brazilian population. In the 2008-2009 National Food Survey (Inquérito Nacional de Alimentação -INA), food consumption was determined to be rice, coffee, beans, bread, and beef 25 .
Sichieri et al. 11 analyzed data from the Life Standards Survey (Pesquisa sobre Padrões de Vida -PPV) in the Northeast and Southeast regions of Brazil, identifying in the traditional pattern, rice, beans, flour and sugar, foods that represent the food culture of the Brazilian population. Results similar to ours were found in studies by Gimeno et al. 10 and Arruda et al. 2 , both in Ribeirão Preto, in which the popular/traditional pattern included beans, cereals, vegetable fat and beef.
In the so-called healthy pattern are vegetables and legumes, greens, fruits (except bananas and oranges) and bananas and oranges. Arruda et al. 2 describe the healthy pattern as vegetables, fruits, peas and other legumes, fish, cassava and polenta, chicken and cereals. Gimeno et al. 10 describe the health pattern as greens, fruits and skimmed dairy products. The low participation of fruits and vegetables in the Brazilian basic diet was pointed out in the INA 2008-2009 survey.
Our findings showed added sugar, white and whole-grain breads, cakes and cookies, and tea and coffee in the snacks pattern. In the analyzes of Hoffmann et al. 26 , the snacks pattern was made up of cake, pizza, and cuca. The change in eating patterns seems to be a result of work routines added to food outside the home. In this sense, the consumption of fast food, the use of processed foods and the easy availability of these foods contribute to the consumption of them 27 .
In the comparison of the patterns identified in this study with those of other studies 2, [10][11][12]22,23,26 , there is similarity between most of the foods, but the authors subjectively name the pattern differently, choosing diverse FG as markers in naming the pattern, thus making it difficult to compare the findings. It is assumed that the corresponding traditional pattern of the Brazilian population is the most commonly described.

CONCLUSION
Much is discussed about the characteristics of the methods of factor extraction in the specialized literature, but we did not locate studies empirically comparing the performance of these techniques in the identification of dietary patterns. In this study, the observed