Empirically Derived Dietary Patterns: Interpretability and Construct Validity According to Different Factor Rotation Methods

This study aimed to investigate the effects of factor rotation methods on interpretability and construct validity of dietary patterns derived in a representative sample of 1,102 Brazilian adults. Dietary patterns were derived from exploratory factor analysis. Orthogonal (varimax) and oblique rotations (promax, direct oblimin) were applied. Confirmatory factor analysis assessed construct validity of the dietary patterns derived according to two factor loading cutoffs (≥ |0.20| and ≥ |0.25|). Goodness-of-fit indexes assessed the model fit. Differences in composition and in interpretability of the first pattern were observed between varimax and promax/oblimin at cutoff ≥ |0.20|. At cutoff ≥ |0.25|, these differences were no longer observed. None of the patterns derived at cutoff ≥ |0.20| showed acceptable model fit. At cutoff ≥ |0.25|, the promax rotation produced the best model fit. The effects of factor rotation on dietary patterns differed according to the factor loading cutoff used in exploratory factor analysis.


Abstract
This study aimed to investigate the effects of factor rotation methods on interpretability and construct validity of dietary patterns derived in a representative sample of 1,102 Brazilian adults.Dietary patterns were derived from exploratory factor analysis.Orthogonal (varimax) and oblique rotations (promax, direct oblimin) were applied.Confirmatory factor analysis assessed construct validity of the dietary patterns derived according to two factor loading cut-offs (≥ |0.20| and ≥ |0.25|).Goodness-of-fit indexes assessed the model fit.Differences in composition and in interpretability of the first pattern were observed between varimax and promax/oblimin at cutoff ≥ |0.20|.At cut-off ≥ |0.25|, these differences were no longer observed.None of the patterns derived at cut-off ≥ |0.20| showed acceptable model fit.At cut-off ≥ |0.25|, the promax rotation produced the best model fit.The effects of factor rotation on dietary patterns differed according to the factor loading cut-off used in exploratory factor analysis.

Introduction
Exploratory factor analysis (EFA) is a multivariate statistical method that has been used in nutritional epidemiology as a data-driven approach to derive dietary patterns.Dietary pattern analysis is of growing interest because it provides valuable and comprehensive information about the overall diet 1 , accounting for the synergistic relation between a myriad of foods and nutrients consumed 2,3 .
From a statistical perspective, EFA is concerned with modeling the covariance among observed variables in order to identify the latent constructs or factors underlying these variables 4 .In dietary pattern analysis, EFA combines, into a factor, food variables that are correlated to each other, but are independent of the other subset of variables 5 .The strength in which an observed variable correlates to a factor is measured by its factor loading 6 .
In order to simplify the factor structure (i.e., matrix of factor loadings) and improve the interpretability of the factor, a rotation method is usually applied after the extraction of a subset of factors 5 .A simple factor structure is achieved when the variable loads highly on as few factors as possible and the loadings of the variables across the factors (cross-loadings) are approximately zero 7,8 .
In dietary pattern analysis, the orthogonal varimax rotation has been the most commonly used rotation method 9,10 .Orthogonal rotation leads to uncorrelated factors that are considered simpler and easier to interpret 8,10 , whereas nonorthogonal (oblique) rotation, such as promax and oblimin, allow producing correlated factors which are considered harder to interpret and, for this reason, have been used less in studies involving dietary pattern analysis 11,12,13,14,15,16,17 .
Once estimated, the factor structure can be evaluated by confirmatory factor analysis (CFA).CFA is a powerful statistical method allowing for testing specific hypotheses about the factor structure by providing an indication of overall fit and precise criteria for assessing construct validity, i.e., the degree of correspondence between constructs and their measures 18,19,20 .This method evaluates whether a pre-specified factor structure provides a good fit to the data 7 .
Considering that the effects of rotation methods on the factor structure, its interpretability and construct validity remain unclear in the field of nutritional epidemiology, the present study aimed to investigate the effects of both orthogonal and oblique rotation methods on composition, interpretability and construct validity of empirically derived dietary patterns.With this study it is expected to advance the current knowledge on procedures of factor analysis and to improve guidance for researchers interested in dietary pattern investigation.

Study population
Data came from the Health Survey of the City of São Paulo, a cross-sectional population-based survey using a complex multistage sampling design to collect health and nutrition information as well as life conditions on a representative sample of residents of the city of São Paulo, Southeastern Brazil, between March 2008 and August 2011.
A two-stage cluster sampling of census tracts and households was performed.In the first stage, a total of 70 census tracts were randomly selected from the 267 urban census tracts in the city of São Paulo as the primary sampling units (PSU).In the second stage, 16,607 households were randomly selected within census tracts.
The main study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human subjects were approved by the Human Research Ethics Committee of the School of Public Health at the University of São Paulo.Written informed consent was obtained from all participants who agreed to participate.

Socioeconomic, anthropometric and lifestyle data collection
A structured questionnaire with information about socioeconomic (per capita family income; educational level), anthropometric (body weight and height), demographic (skin color, age) and lifestyle characteristics [smoking status; alcohol use; physical activity -International Physical Activity Questionnaire (IPAQ)] was applied at the individual's home by trained interviewers.

Dietary data collection
Dietary data were collected by both face-toface and telephone interviews.In the face-toface interview, the first 24-hour dietary recall (24HR) was collected according to procedures described in the USDA Five-Step Multiple Pass Method 21 .This method guides the individual through a 24 hour reference period of food intake (more commonly, the day before interview) and provides different opportunities for individuals to remember and describe all foods and beverages he or she has consumed 21 .During the telephone interview, the second 24HR was collected according to the interviewing system incorporated into the University of Minnesota's Nutrition Data System for Research (NDS-R).This interviewing system enhances data quality since it standardizes the probes about foods and portions consumed 22 .All individuals were advised to report food consumption in household measures as well as to mention the eating occasions, meal time, cooking methods, seasonings and brand names.Quality control of the 24HR was conducted during data collection in order to identify and correctly report on errors.Dietary data collection occurred in non-consecutive days throughout all seasons and days-ofthe-week.
After dietary data collection, all household measures reported in each 24HR were converted into grams and milliliters according to Brazilian publications, which were also used to provide standard recipes of regional food preparations 23,24 .The NDS-R, version 2007, was also used to determine the nutrient content of each food and beverage consumed.This program was developed by the Nutrition Coordinating Center at the University of Minnesota, Minneapolis, USA, and has the USDA Food Composition Table as the primary database source.

Foods grouping
A total of 1,169 different foods were reported in both 24HR and were collapsed into 38 food groups for factor analysis.Foods consumed by at least 5% of the sample evaluated (948 foods) were combined according to the previously used criteria: similarity of the nutrient profile 25,26,27 (e.g., all types of coffees were combined into the "Coffee" group) and the particular dietary habits and culinary usage of the Southeastern Brazilian population 28 (e.g., "Beans" group includes brown and black beans because they are cooked pulses that are usually eaten with rice, whereas the "Other pulses" group includes soybeans, lentils, chickpeas and snow peas because these are usually consumed in different preparations, such as soups, creams and salads).
The correlation matrix of food groups was analyzed to identify how the food groups were correlated to each other.The correlation matrix revealed that four food groups (Cereals, Flours, Roots and Tubers, and Seafood) did not show a significant correlation (p-value > 0.05) with any other food group, and then were excluded from further analysis.A detailed description about the 34 food groups and its composition is provided in Table 1.
The food group intakes, in grams, were adjusted for the within-person variation through the web-based statistical modeling technique Multiple Source Method (MSM) before factor analysis.This is a statistical method developed within the European Food Consumption and Validation Project (EFCOVAL) which is suitable for estimating the usual nutrient and food intakes (including those episodically consumed) based on two or more short-term dietary methods such as 24HR 29 .

Statistical analysis
Sociodemographic, anthropometric and lifestyle characteristics of participants were described by sex and compared through a Chi-squared test.All descriptive analyses were conducted using Stata version 12.0 (Stata Corp., College Station, USA), considering the sampling design effect (svy command for proportion analysis) and significance level of 5%.
Dietary patterns were derived from EFA using the robust maximum likelihood parameter estimation (MLR) available in Mplus software (version 6.12; Muthén & Muthén, Los Angeles, USA).MLR was chosen because it is an estimation procedure appropriate to non-normally distributed data allowing for complex sampling designs and is also available for use in CFA 30 .It leads to more appropriate estimates than the conventional maximum likelihood estimation when the assumption of multivariate normal distribution does not hold 31 .
The Kaiser-Meyer-Olklin (KMO) test and Bartlett's sphericity test were used to measure the sample adequacy before deriving dietary patterns.KMO values above 0.50 and p-value < 0.05 for Bartlett's sphericity test were considered acceptable 32 .The communalities of the food groups were calculated, representing the variance of each observed variable explained by the factor solution.Also, the percentage of variance explained by the factors was estimated for each rotation method.

Food groups Food items
In order to identify the number of factors to retain, the Kaiser criterion (eigenvalue > 1.0) was used in the first step.This criterion is one of the most widely used in EFA with the rationale that the minimum variance explained by the factor should be equal to or greater than the variance of one single observed variable 33 .In this study, the Kaiser criterion would lead to the retaining of 14 factors which is an excessive number of factors for further analysis.Hence, a plot of the eigenvalues (the Cattell's scree test) was investigated in the second step and suggested two break points in the data that afforded two and four factor solutions (Figure 1).In the third step, the interpretability of two and four factor solutions was investigated.The two factor solution was more interpretable than the four factor solution and then was retained to investigate the effects of the factor rotations on the composition, interpretability and construct validity of each factor.For interpretation of the factor solution, food groups with a positive factor loading were considered as contributing directly to the factor, while food groups with negative loadings were considered  to be inversely correlated with the factor.Considering the methodological purposes of this study, the factors were presented in alphanumeric labels rather than descriptive names, in order to facilitate reporting of results.
The factor rotation selected for this research was the same as those reported in previous studies on dietary pattern analysis that used factor analysis or principal component analysis: the orthogonal varimax 34,5,6,37,38,39,40 , the oblique promax 11,12,13,14,15 and direct oblimin 16,17 .In brief, the varimax is a type of orthogonal rotation that attempts to maximize the variance of squared loadings on a factor, i.e., to reduce the cross-loadings of the variables, leading to uncorrelated simple factor structures 41 .The promax is an oblique rotation that is performed in two stages.In the first one, the target matrix of loadings is first defined through a varimax rotation.This matrix of loadings is raised to some power (kappa) -usually ranging from 2 to 4 -aiming to produce a simple factor structure.The second stage is obtained by computing a least square fit from the target matrix 42 .In this study, the Mplus default promax rotation power (kappa = 4) was used.Direct oblimin is another type of oblique rotation that aims to produce factors with perfect simple structure, i.e., factors with cross-loadings near zero or equal to zero.For this, a delta parameter ranging from 0 to 1 should be set.In this study, a delta equal to zero was chosen in an attempt to produce a simple factor structure 8,43 .
After rotation, two factor loading cut-offs were applied to select the food items to CFA: ≥ |0.20| and ≥ |0.25|.These cut-offs were chosen because they represent two factor loading cut-offs applied in dietary pattern studies 34,35,36,37,38,39,4 0,44,45,46,47,48,49 , that would lead, in this study, to a less restrictive number of food items than the most commonly applied cut-off (i.e., ≥ |0.30|).The CFA was executed in Mplus software 6.12 to assess the construct validity of each dietary pattern derived using the MLR estimation method.

Participant characteristics
Participants included 424 men and 678 women.Men and women had the same distribution of age, with about 46% of them aged 60 years and more (p-value = 0.669).Also, around 50% of men and 46% of women were normal weight (p-value = 0.433); 60% of men and 64% of women had low educational level (up to 8 years of study) (p-value = 0.069) and 83% of men and 84% of women had a maximum per capita income of R$ 1,000 per month (p-value = 0.599).A significantly higher proportion of women compared with men were of white skin color (63% vs. 57%, p-value = 0.040), non-smokers (85% vs. 76%, p-value < 0.001), non-alcohol drinkers (62% vs. 41%, p-value < 0.001) and with insufficient/sedentary physical activity level (55% vs. 40%, p-value < 0.001) (data not shown).

Dietary patterns composition and interpretability
Table 2 shows the communalities of the dietary variables as well as the factor-loading matrix of the dietary patterns derived from EFA according to different rotation methods.The KMO test and Bartlett's sphericity test confirmed the sample adequacy for factor analysis (KMO = 0.59 and p < 0.001, respectively).The percentage of variance explained by each factor was quite similar across rotation methods, ranging from 5.15 to 5.21 to the Factor 1 and from 4.43 to 4.52 to the Factor 2. Considering factor loadings ≥ |0.20|, the composition of the first dietary pattern (Factor 1) extracted by varimax rotation was slightly different from that extracted by both oblique rotations, Promax and Oblimin.The Factor 1 extracted by varimax rotation is composed of the traditional foods consumed by the Brazilian population namely rice, beans, sugar, white breads, butter and margarine, beef (positive loadings) and lowfat milk (negative loading).The Factor 1 patterns extracted by promax and oblimin rotation were identical to each other and included the aforementioned foods plus whole breads and white cheese, both with negative loadings.The second dietary pattern (Factor 2) was similar across factor rotations and was composed of salad dressing, leafy vegetables, non-leafy vegetables, spices, whole breads, white cheese, fruits and fruit juices.Among the food groups evaluated, salad dressing, rice, beans, leafy and non-leafy-vegetables were those with the highest percentage of variance explained by the factors, i.e., with the highest communalities.
Increasing the factor loading cut-off from ≥ |0.20| to ≥ |0.25|, the differences in Factor 1 across rotation methods were no longer observed.This factor was comprised of only four food items which characterize Brazilian staple foods, i.e., rice, beans, sugar and white breads.The Factor 2 extracted by varimax and promax rotations had a similar composition including foods consumed in a typical vegetable-based diet: salad dressing, leafy vegetables, non-leafy vegetables and spices.With respect to oblimin rotation, the Factor 2 comprised all the aforementioned vegetable foods plus whole breads.

Construct validity of dietary patterns
Table 3 presents the CFA results according to the factor loading cut-off ≥ |0.20| and different rotation methods.Regardless of rotation, the factor loadings were statistically significant for all dietary patterns (p-value < 0.05) and similar to the factor loadings obtained in EFA.Since promax and oblimin are oblique rotations and produced identical dietary patterns at cut-off ≥ |0.20|, the results of the CFA for these rotations were also identical.It should be pointed out that promax and oblimin produced dietary patterns with small but significant correlations (r = 0.17, p-value < 0.01) (data not shown).Irrespective of the factor rotation applied, none of the dietary patterns derived showed an acceptable model fit based on the fit indexes evaluated other than SRMR (whose values were < 0.08).
The factor loadings of all food items showed statistical significance at cut-off ≥ |0.25| for both orthogonal and oblique rotations (Table 4).The promax rotation, however, showed a better model fit than either varimax or oblimin.Although no differences were observed in the composition of the dietary patterns derived by varimax and promax rotations, the CFI, TLI, RMSEA and SRMR indicated a better fit for promax than for varimax.The oblimin rotation produced the worst result, with the CFI and TLI values being < 0.90.The interfactor correlation was small but significant with both promax (r = 0.19, p-value < 0.01) and oblimin rotations (r = 0.18, p-value < 0.01) (data not shown).

Discussion
This study was the first to provide evidence about the effects of different rotation methods in EFA  position and interpretability of dietary patterns that may be influenced by the factor loading cutoff selected during EFA.Considering the factor loading cut-off ≥ |0.20|, differences in composition and in interpretability of the first dietary pattern (Factor 1) but not of the second pattern (Factor 2) were observed between orthogonal and oblique rotations, i.e., between varimax and promax/oblimin rotations.These differences may be explained by the cross-loadings ≥ |0.20| of two food groups -white cheeses and whole breadsthat occurred with oblique rotations.However, increasing the factor loading cut-off from ≥ |0.20| to ≥ |0.25| eliminated the cross-loadings and also the differences in the composition of the Factor 1 across rotation methods.Despite the differences produced on dietary patterns composition, the rotation methods produced similar results concerning the percentage of variance explained for Factors 1 and 2. Differences in composition and interpretability of the dietary patterns across rotation methods may be less remarkable at higher factor loading cut-offs because this can contribute to reduce the occurrence of cross-loadings in the factor structure.It should be emphasized that although all rotations selected for this study aimed to reduce the cross-loadings toward zero 8,41,42,43 , only the orthogonal varimax attained this purpose in both factor loading cut-offs.Therefore, researchers must also consider whether crossloadings are interesting or not when selecting the rotation method and the factor loading cut-off for EFA in dietary pattern studies.
Another noticeble finding concerns the construct validity of the dietary patterns derived with different rotation methods and factor loading cut-offs.Regardless of rotation, the factors derived with the factor loading cut-off ≥ |0.20| did not show acceptable construct validity.Even if it was adequate to produce meaningful dietary patterns, this cut-off was quite low to select food items that could be valid to depict the dietary patterns of the population evaluated.
In fact, only the factors derived by promax rotation with a factor loading cut-off ≥ |0.25| in EFA showed an acceptable construct validity as indicated by all goodness-of-fit indexes except the adjusted Chi-squared test.Differently from the other indexes evaluated in this study, the adjusted Chi-squared test is directly influenced by the sample size and the number of variables observed.Hence, the larger the sample size and the number of variables, higher is the Chi-squared value.Also, the higher the number of free parameters of the model, lower is the number of degrees of freedom of the test 32 .Considering the limitations of the adjusted Chi-squared test, experts recommend evaluating model fit by different goodness-of-fit indexes including those analyzed in this study, because they reflect different aspects of the model adjustment 54 .
It should also be mentioned that the orthogonal varimax rotation extracted factors with the same variables as the promax rotation, but without construct validity.It means that the assumption of independence of the factor structure was inappropriate for these data.Actually, the correlation between factors derived by oblique promax rotation was significant different to zero (r = 0.19).In this way, researchers must also be cognizant that the choice of an orthogonal rotation solely based on their independent assumption of the factors may fail to extract valid factors.Hence, it is important to verify whether this assumption is appropriate before deriving and interpreting the dietary patterns.If so, both orthogonal and oblique rotations will probably lead to similar factors at high factor loading cutoffs (e.g., ≥ |0.25|).Moreover, it must be considered that the correlation between dietary patterns may produce factor scores that are also correlated, and thus, caution is needed when planning to use these scores as dependent variables in regression models.Since the independency assumption of the observations is required for traditional regression models, a methodological alternative is to apply the exploratory structural equation modeling (ESEM).This method has emerged as a suitable multivariate statistical modeling technique to examine associations between latent (e.g., dietary patterns) and observed variables, allowing for multiple dependent and independent variables in a single equation 30 .The ESEM relies on the covariance structure of the observed variables and can be interpreted as a combination of EFA, CFA and regression analysis, and is indicated when the researcher has a weak hypothesis about how multiple-observed variables load on the factors 30 .Another advantage of the ESEM to dietary pattern analysis includes the possibility of testing the significance of factor loadings in lieu of applying predetermined factor loading cut-offs, and this reduces the subjectivity during modeling.More details about this method can be found in Asparouhov & Muthén 55 .
This study has some methodological features that should be addressed.First, the dietary patterns derived were based on data collected by a short-term dietary assessment method, i.e., by two non-consecutive 24HR.It is known that, although the short-term dietary assessment methods provide detailed data about types and amounts of foods consumed 56 , they lead to a large within-person variation of dietary estimates.This variation could attenuate the correlation matrix of the foods and thus the factor loadings observed in each dietary pattern.To overcome this, the food groups were adjusted for the within-person variation through the MSM before proceeding to factor analysis as performed by Selem et al. 49 .It is worth mentioning that this adjustment may be considered a methodologi-cal advance in dietary pattern analysis and may have contributed to enhance the reliability of the results.
Second, the estimation method used in EFA to derive dietary patterns in this study differed from the frequently used method in other dietary pattern studies.The robust maximum likelihood parameter estimation (MLR) was chosen in EFA in lieu of the principal component factor method (PCF) because it was also available for use in CFA as an appropriate estimator to non-normally distributed data 31 .The use of MLR in both EFA and CFA aimed to avoid a misinterpretation of the results that might occur if different estimation procedures were applied for deriving dietary patterns and for assessing their construct validity.
Finally, this study could not evaluate the effects of rotation methods on composition, interpretability and construct validity of dietary patterns derived at the most applied factor loading cut-off, i.e., ≥ |0.30|, because it would lead to a very restrictive number of food items for factor's interpretability and CFA purposes.Nonetheless, the authors ensured methodological strictness by selecting two other cut-offs (≥ |0.20| and ≥ |0.25|) that are also commonly applied in dietary pattern studies 34,35,36,37,38,39,40,44,45,46,47,48,49 .
In summary, the effects of rotation methods on composition, interpretability and construct validity of dietary patterns differed according to the factor loading cut-off used in EFA.Less remarkable differences in composition and interpretability of the dietary patterns according to rotation method may occur at higher cut-offs such as ≥ |0.25| compared with lower ones (≥ |0.20|).Irrespective of rotation method, dietary patterns derived at factor loading cut-off ≥ |0.20| did not show acceptable construct validity.At factor loading cut-offs ≥ |0.25|, however, the promax rotation showed a better model fit than either varimax or oblimin.Hence, the authors recommend performing at least one orthogonal and one oblique rotation in EFA, applying the factor loading cut-off and then comparing the factor solutions.Moreover, the CFA should be conducted to test the construct validity of the dietary patterns derived and to verify whether the factor loading cut-off chosen during the EFA is adequate or not to select the food items that truly depict dietary patterns of the population.Further studies are needed to investigate the effects of other rotation methods on the dietary patterns derived in different populations.

Resumen
El estudio tuvo como objetivo investigar los efectos de los métodos de rotación en la interpretabilidad y validez de un constructo de patrones alimentarios, derivados de una muestra representativa de 1.102 adultos brasileños.Los patrones se derivaron de un análisis factorial exploratorio.Se aplicaron las rotaciones ortogonal (varimax) y oblicua (promax, oblimin directa).La validez de constructo de los patrones fue evaluada por un análisis factorial confirmatorio, según los puntos de corte de cargas factoriales: (≥ |0,20| y ≥ |0.25|).Se analizaron los índices de ajuste del modelo.Se observaron diferencias en la composición e interpretación del primer factor entre varimax y promax/oblimin en el punto de corte ≥ |0,20|.En el punto de corte ≥ |0,25|, ya no se observaron diferencias.Ninguno de los patrones derivados en el punto de corte ≥ |0,20| presentaron un ajuste del modelo aceptable.En el punto de corte ≥ |0,25|, la rotación promax produjo el mejor ajuste.Los efectos de las rotaciones factoriales en los patrones fueron variables, según el punto de corte de carga factorial utilizado en análisis factorial exploratorio.Consumo de Alimentos; Hábitos Alimenticios; Nutrición en Salud Pública; Análisis Factorial Contributors M. A. Castro proposed the analytical methodology for the study, carried out the statistical analysis and wrote the manuscript.V. T. Baltar provided expertise in statistical analysis and contributed in the manuscript write-up.S. S. C. Selem contributed towards the data anlaysis of food consumption and the manuscript write-up.D. M. L. Marchioni supervised the statistical analysis, provided expertise in the data analysis of food consumption and carried out a critical revision of the manuscript.R. M. Fisberg coordinated the data collection, collaborated with the write-up of the manuscript and was responsible for a critical revision of the text.

Figure 1 Scree
Figure 1 Scree plot of the eigenvalues of unrotated factors.Health Survey of the City of São Paulo, Brazil, 2008-2011.

Table 1
Description of the food groups used in the dietary pattern analysis.Health Survey of the City of SãoPaulo, Brazil, 2008-2011.

Table 2
Factor-loading matrix for dietary patterns derived according to different rotation methods.Health Survey of the City of SãoPaulo, Brazil, 2008-2011.

Table 4
Confirmatory factor analysis of dietary patterns derived according to factor loadings ≥ |0.25| * and rotation criteria.Health Survey of the City of São Paulo,