Rosenberg Self-Esteem Scale: Method Effect and Gender Invariance

This study aimed to evaluate the dimensionality of the Rosenberg Self-Esteem Scale (RSES), by testing the adjustment of eight factorial models: a one-factor and two-factor model and six single-factor models controlling for the method effect associated with the wording of negative and positive items, through the correlated traits-correlated uniqueness (CTCU) and correlated traits-correlated methods (CTCM) approaches. We also tested measurement invariance across gender. A total of 689 participants took part in the study, with ages between 18 and 70 years (M = 25.5; SD = 8.06), mainly females (77.1%), who answered the RSES and sociodemographic questions. The results showed that single-factor models controlling for the effect of negative items alone or positive and negative items together best fit the data. The results also indicated that the RSES is invariant across gender, presenting the same theoretical structure and psychological meaning for men and women.

Self-esteem can be defined as the evaluation that individuals make of themselves, anchored on personal feelings and beliefs about their skills, intelligence, social relations and future expectations, expressed in a positive attitude (of approval) or a negative one (of depreciation), indicating to what measure individuals believe they are capable, relevant, successful and deserving (Rosenberg, 1965).According to Hutz and Zanon (2011), high self-esteem apparently benefits people, since when they feel good about themselves, they are better able to deal effectively with challenges and negative feedback and generally believe that others value and respect them.In contrast, individuals with low self-esteem perceive the world through a negative filter, and their general self-aversion extends to the perceptions of everything in their surroundings.Therefore, self-esteem is associated with important aspects during people's lives.Studies have shown that high levels of self-esteem, for example, are correlated with subjective well-being (Kong, Zhao & You, 2013), satisfaction with interpersonal relationships (Erol & Orth, 2016), and satisfaction with life (Moksnes & Espnes, 2013).On the other hand, low self-esteem is correlated with, among other feelings, depression (Rieger, Gollner, Trautwin & Roberts, 2016) and suicidal ideation (Kleiman & Riskind, 2013).
Researchers have proposed a wide range of selfesteem measures, including self-reported scales (Robins, Hendin & Trzesniewski, 2001;Rosenberg, 1965), indirect measures, such as evaluation of the preference for names and initials (Gebauer, Riketta, Broemer & Maio, 2008), and implicit association tests (Falk, Heine, Takemura, Zhang & Hsu, 2015).Among these, one of the most used measures continues to be the Rosenberg Self-Esteem Scale (RSES; Rosenberg, 1965).This scale is composed of ten items, five worded positively and five worded negatively, addressing overall self-esteem as a set of feelings and thoughts of individuals about their own value, competence and adequacy, reflecting a positive or negative attitude about themselves.According to Sbicigo, Bandeira and Dell'Aglio (2010), the RSES was proposed as a one-dimensional measure, in which selfesteem is classified in three levels: low, characterized as feelings of incompetence, inadequacy and inability to face life's challenges; medium, characterized by fluctuation between feelings of approval and rejection; and high, consisting of self-judgment of value, confidence and competence.
Many studies have been conducted to investigate the factorial structure of the RSES, including in Brazil.The results obtained, however, have not been uniform.Studies performed in different countries (Supple, Su, Plunkett, Peterson & Bush, 2012), especially adopting an exploratory factorial approach, have indicated that the RSES is better represented by a two-factor structure, where one factor assesses positive self-esteem (composed of five positively worded items) and the other evaluates negative self-esteem (composed of five negatively worded items).On the other hand, various studies have also indicated a single-factor solution for the RSES (DiStefano & Motl, 2009;Michaelides et al., 2016).Of the validation studies conducted in Brazil, two have proposed a bifactorial structure (Avancini, Assis, Santos & Oliveira, 2007;Meurer, Luft, Benedetti & Mazo, 2012;Sbicigo et al., 2010) while one indicated a unifactorial structure (Hutz & Zanon, 2011).
The authors that advocate a two-factor structure for the RSES argue that positive and negative selfesteem are different concepts, which have presented different patterns of association with other theoretically related constructs (Supple et al., 2012).For example, Owens (1994) observed that positive self-esteem was significantly related to school grades but not related to depressive symptoms, while negative self-esteem was positively related to depressive symptoms and delinquent behaviors, but not with school grades.Supple et al. (2012), in a study of self-esteem among teenagers, also found different patterns of correlation, with positive and negative self-esteem being differently predicted by parental behavior and academic motivation.
This controversy over the factorial structure is reflected in the debate about whether the bifactorial structure indeed encompasses two theoretically distinct elements related to self-esteem or results from a type of method effect that occurs when including items that are positively and negatively worded in the same scale (DiStefano & Motl, 2009;Urbán, Szigeti, Kökönyei & Demetrovics, 2014).This method effect refers to the variance that arises from, among other sources, the positive or negative phrasing of the items, instead of the variance related to the construct of interest, resulting in a systematic variation that is undesirable and impairs the construct's measurement (Lindwall et al., 2012).This influence has been called the wording effect, and is a type of measurement method effect (Wu, 2008).
The use of positive and negative items in psychological measures is a common practice, seeking to avoid acquiescence error, i.e., the tendency of respondents to agree or disagree with all the items in general, regardless of their content (Podsakoff, MacKenzie & Podsakoff, 2012).However, the inclusion of positive and negative items can introduce systematic measurement errors that affect the analysis and interpretation of the data, besides the effects of the construct being assessed.This can result, for example, in the emergence of factors that separately group positive and negative items, even when the content of these items is congruent (DiStefano & Motl, 2006).
One of the explanations for the phenomenon of negative and positive items forming separate factors involves the difficulty or the way respondents cognitively process items written inversely, introducing variation in responses that is not associated with the construct under analysis (Wu, 2008).Therefore, the negative items function as cognitive "speed bumps" that require a more elaborate cognitive process instead of an automatic response at the moment of answering the item, thus resulting in variation due to the measurement effect (Podsakoff et al., 2012).Although this effect is primarily related to negatively worded items, some studies have indicated that positively worded items also can bias the responses, because that effect is mainly related to the participants' interpretation of the item's content (Wu, 2008).
Some strategies have been used to evaluate the wording effect of the items on the factorial structure of scales, such as using confirmatory factor analysis (CFA), in which it is possible to control for the unique variance associated with negative items, positive items, or both together (DiStefano & Motl, 2006;Wu, 2008).Unlike exploratory factor analysis, CFA permits specifying possible method effects and comparing alternative factor models that specify these effects or not.Using CFA, two methodological strategies have been developed, derived from multitrait-multimethod models, to separate the variance derived from the construct from that derived from the method of wording the items (Lindwall, 2012).One of them is the correlated traits-correlated uniqueness (CTCU) model, in which the method is modeled by establishing covariances between the errors of the affected items.In contrast, the correlated traits-correlated methods (CTCM) model treats the response effect of the items as a latent variable that is incorporated in the analysis as a distinct factor, orthogonal to the factor of the construct to be evaluated.
According to Lindwall et al. (2012), both strategies allow comparing whether the models in which the method effect is specified better fit the data than models that do not include this effect.If the model presents better adjustment, the existence of the method effect can be inferred.Moreover, the method effect in CTCM models can be quantified and correlated with other variables, such as gender or socioeconomic class, which cannot be done by using CTCU models.On the other hand, CTCM models can cause problems in performing the analysis, such as under-identification and negative variance estimates, among others (Marsh, Scalas & Nagengast, 2010).For these reasons, we believe that both approaches should be employed, even though Lance, Noble and Scullen (2002) stated that when convergent and acceptable factor solutions are obtained, the use of a CTCM model is preferable.
Various studies aimed at assessing the method effect on the factor structure of the RSES, utilizing both CTCU and CTCM models, have indicated that the unifactorial structure controlling for the method effect of the wording of the negative items presents the best fit to the date than a bifactorial structure or unifactorial structure without that control (DiStefano & Mottl, 2009;Lindwall et al., 2012;Marsh et al., 2010;Michaelides et al., 2016;DiStefano & Motl, 2009;Tomás, Oliver, Galiana, Sancho & Lila, 2013;Urbán et al., 2014;Wu, Zuo, Wen & Yan, 2017).On the other hand, some studies have also indicated that the models that control for both the method effects associated with negatively and positively worded items fit the data well (Lindwall et al., 2012;Wu, 2008).
According to some researchers, the method effect related to the wording of items can vary in function of the population or be more preponderant in certain groups than in others, such as between men and women (DiStefano & Motl, 2009;Tomás et al., 2013).Differences between men and women have been observed in relation to global self-esteem, with men generally presenting higher levels (Bleidorn et al., 2016;Tomás et al., 2013).According to DiStefano and Motl (2006), the differences in average RSES scores between men and women might be associated with the different responses to negatively worded items.In other words, people of different gender can provide stronger or weaker responses to negatively worded items, hence causing differences in the average self-esteem scores.Studies examining the method effect associated with both negative and positive items (DiStefano & Motl, 2009;Lindwall et al., 2012) have not observed differences in the adjustment of the models to the data when testing the invariance in relation to gender, although they have observed differences in the average raw scores of men and women.On the other hand, Michaelides et al. (2016), when testing the gender invariance in a model controlling for the method effect of positive and negative items, observed configurational and metric invariance but not scalar invariance, making it impossible to compare the latent means.
In short, there is no consensus regarding the factor structure of the RSES, with studies adopting an exploratory approach indicating a two-factor structure, while others that have controlled for the wording effect of items have indicated a single-factor structure is most suitable.The studies of the RSES conducted in Brazil have not considered the method effect associated with the negative and positive items when testing the factor structure.Therefore, this study sought to gather evidence about the factor structure of the RSES in Brazil by testing the fit of bifactorial and unifactorial structures, and of unifactorial solutions while controlling for the method effect associated with the items through the CTCU and CTCM strategies.We also evaluated whether the factor structure of the RSES is invariant between men and women.

Participants
A total of 689 people took part in the study, consisting of university students and people from the general population, all residing in the city of Fortaleza, Ceará.The participants' age ranged from 18 to 70 years, with average of 25.5 years (SD = 8.06), the majority being women (77.1%).The sample was chosen by convenience, so it was non-probabilistic.

Instruments
The participants received a booklet containing the Rosenberg Self-Esteem Scale (RSES) and sociodemographic questions.The RSES (Rosenberg, 1965), translated into Portuguese and adapted for Brazil by Hutz and Zanon (2011), is composed of ten items with statements related to feelings of self-esteem and selfacceptance to assess global self-esteem, containing five items related to a positive personal view (e.g., I feel that I have various good qualities; On the whole, I am satisfied with myself) and five items with a negative personal view (e.g., I feel I do not have much to be proud of; On the whole, I am inclined to believe that I am a failure).The participants were asked to state how much they agree or disagree, with each statement, on a seven-point scale ranging from 1 (strongly disagree) to 7 (strongly agree).

Procedure
The participants were told that taking part in the survey was voluntary and that the information collected would be treated as confidential, only used for academic purposes.The data were gathered in various places, such as classrooms and public venues.All the items were answered individually.The research procedures were in line with the ethical rules on research with human beings, as specified in Resolution 510/2016 from the National Health Council.According to article 1 of the resolution, this study did not require registration in the CEP/CONEP system (Research Ethics Committee/National Research Ethics Commission) because of data aggregation without the possibility of individual identification.

Data analysis
We initially calculated descriptive statistics for the RSES items and tallied the overall score, followed by computing the asymmetry and kurtosis values and constructing the Pearson pairwise correlation matrix between the items and between them and the overall score.We used Chronbach's alpha to evaluate the internal consistency of the responses, followed by application of AMOS 18 to evaluate the factorial validity of the RSES.To compare the alternative factor models, we performed multiple confirmatory factor analysis, considering the covariance matrix and employing the maximum likelihood (ML) method.Only 0.5% of the observations had missing data, which were filled with the mean of the responses obtained in the corresponding item.
To ascertain the fit of the proposed model and compare it with alternative models, we used the following indicators: chi-square (χ²), normalized chi-square (χ²/df), comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), Akaike information criterion (AIC), and the expected cross-validation index (ECVI).These indices have often been used in previous studies, and although each one has merits and limitations, when used together they provide strong indication of the fit of models to the data (Byrne, 2010) We tested the fit of eight factorial solutions for the RSES, presented in Figure 1.Model 1 is a single-factor solution while Model 2 represents a structure with two correlated factors, when dividing the items between positive self-esteem and negative self-esteem.Models 3 and 4 represent a single-factor structure of the RSES with the measurement errors correlated between positive items (Model 3) and between negative items (Model 4).Model 5 denotes a unifactorial structure with measurement errors correlated simultaneously between positive items and between negative items.Models 6 and 7 represent a unifactorial structure in which the method effect is controlled via a latent factor associated with the positive items (Model 6) and negative items (Model 7).Finally, Model 8 denotes a unifactorial structure with correlated latent factors to control for the method effect in the positive and negative items simultaneously.
Finally, to assess whether the factor structure of the RSES was equivalent between men and women, we applied multigroup confirmatory factor analysis (MCFA).For this we tested four hierarchical invariance models (Byrne, 2010): configurational, metric, scalar and residual.The configurational invariance assesses whether the factor structure of the instrument has good fit to the data in both groups.The metric invariance evaluates whether the regression weights of the items are statistically equivalent between the groups analyzed.The scalar invariance measures whether the observed scores are related with the latent scores, i.e., if the individuals that obtained the same score for the latent variable also obtained the same score for the observed variables, irrespective of the group.Finally, the residual invariance of the items evaluates to what extent the measurement errors are equal for different groups.
When the models tested are nested, the invariance between the models can be ascertained by the difference of the chi-square value and degrees of freedom Δχ²(Δdf) of the model evaluated in relation to the previous (less restricted) one.If the chi-square value resulting from this difference is not significant (considering the reference values in relation to the degrees of freedom resulting from the difference between the two models), this indicates the existence of invariance in relation to the model tested (Byrne, 2010).However, the use of only Δχ²(Δdf) has been criticized, with the recommendation to use other metrics, such as the difference between the CFI of the models (ΔCFI).In this respect, CFI differences below 0.1 provide evidence of invariance between the models tested (Byrne, 2010).

Results
Table 1 presents the fit indices for the eight models tested.Initially, Model 1 (simple unifactorial) did not adequately fit the data, while Model 2 (bifactorial, based on positive and negative items) presented a marginal fit to the data.The correlation between the two factors was moderate (r = -0.25;p < 0.001), indicating an orthogonal structure.The models including the method effect (3 and 4: controlling for the method using correlations between the errors; and 6, 7 and 8: controlling for the method effect through latent factors) presented generally better adjustment to the data than Models 1 and 2. However, the models that controlled for the method effect of the negative items (Models 4 and 7) had better fit than those that controlled for the method effect associated with the positive items (Models 3 and 6).Furthermore, Models 3 and 6 presented slightly better fit than Model 2, indicating that controlling only for the method effect associated with the positive items did not significantly improve the fit to the data.
Finally, in relation to the models that controlled simultaneously (Models 5 and 8) for the effect of the positive and negative items, Model 5 was an unidentified model, as also observed in other studies (Michaelidis et al., 2016;Vasconcelos-Raposo, Fernandes, Teixeira & Bertelli, 2012;Supple et al., 2012;Wu et al., 2017).Due to the absence of non-significant correlations between the errors observed in Models 3 and 4, it was not possible to respecify Model 5, as proposed by Tomás and Oliver (1999), who recommended setting the nonsignificant correlations of unidentified models at zero.Therefore, Model 5 was excluded from further analysis.In turn, Model 8 presented adjustment indices considered good in the literature, although these indices were similar to the Model 4 indices, which had the best fit among the eight models tested.The value of Cronbach's alpha for global self-esteem was 0.82.
Since Model 4 best fit the data, we decided to evaluate whether this model was invariant between men and women.The fit indices for the hierarchical invariance models are presented in Table 1.From the most restrictive model (configurational) to the least restrictive one (residual), the fit indices were consistent and within the levels recommended in the literature, indicating the suitability to conduct a factorial invariance test for gender.The configurational model presented good adjustment indices, indicating that the one-factor structure, with control for the effect of the negative items through the correlations between the errors, was adequate for men and women.The metric invariance also was supported by the data, since the differences of the chi-square values were not significant and the CFI continued having the same value.
In the scalar variance model, the chi-square value increased significantly in comparison with the previous one, Δχ² = 21.4 and Δdf = 10, p < 0.05.On the other hand, the change in the CFI was smaller than 0.01.Based on these inconsistent results, even considering that χ² is a sensitive and often biased indicator, we decided to conduct individual analyses of the intercepts of each item to identify which was degrading the model's fit.We observed that by allowing the intercept of item 3 to vary freely among the groups, the chi-square value diminished considerably, ceasing to be significantly different in relation to the metric variance model (Δχ² = 16;30 and Δdf = 9, p = 0.061).Furthermore, the ΔCFI value less than 0.01 indicated that RSES had partial scalar invariance according to our data.
Due to this partial scalar invariance between men and women, we compared the latent means, setting the average for women to zero.The latent mean estimated for men was not significantly higher than that for women (average difference = 0.04; standard error = 0.07; Z = 0.48; p = 0.63).For illustrative purposes, the raw scores observed for men (M = 5.27; SD = 1.07) and women (M = 5.08; SD = 1.08) presented a marginally significant difference, t (685) = 1.90, p = 0.058, Cohen's d = 0.18.Finally, to test for residual invariance (errors of the items were equal for the different groups), we restricted the correlations between the negative items of Model 4.There was a significant increase in the chisquare value (Δχ² = 47.8 and Δdf = 10, p < 0.01) and an increase of the CFI to almost 0.01, indicating that the invariance of the residuals between men and women was not supported by the data.

Discussion
We investigated the factor structure of the Brazilian version of the RSES by testing the fit of twofactor and one-factor structures, as well as the fit of single-factor structures in which we controlled for the method effect associated with negative and positive items, through the correlated traits-correlated uniqueness (CTCU) and correlated traits-correlated methods (CTCM) strategies.We also evaluated whether the factor structure of the RSES presents invariance between men and women.The results indicated that the onefactor structure with correlated measurement errors for the negative items (Model 4; CTCU) presented better fit than the other models.Nevertheless, Model 8 (CTCM), unifactorial with two latent factors to control for the method effect of the positive and negative items, also presented good fit to the data.The analyses also indicated that the model that best fit the data (Model 4) was equivalent between men and women, with only invariance of the residuals not being observed.
With respect to the factor structure, our results are in line with previous studies that have indicated better fit of single-factor solutions for the RSES that include control for the method effect associated mainly with negative items, but also with positive ones (Lindwall et al., 2012;Marsh et al., 2010;Michaelides et al., 2016).In this sense, the two-factor structure of the RSES, commonly observed in Brazilian studies, can be understood as an artifact derived from the wording of the items in positive and negative form, not as two distinct selfesteem factors.We therefore recommend that the RSES be considered a general measure of self-esteem, in a more parsimonious representation of this construct, as originally proposed by Rosenberg (1965).
The method effect associated with the wording of the items in the structure of the RSES is not a specific problem of this scale.It has been observed in other scales that use items phrased in positive and negative terms with similar number, such as the General Health Questionnaire and the Need for Cognition Scale.Some studies have indicated that these scales are better interpreted through a single-factor structure with control for the effect of the negative items, instead of the twofactor structure commonly seen in studies that adopt exploratory factor techniques (Gouveia, Lima, Gouveia, Freires & Barbosa, 2012;Molina, Rodrigo, Losilla & Vives, 2014;Zhang, Noor & Savalei, 2016).
Regarding the studies conducted in Brazil about the factor structure of the RSES, most have pointed to positive and negative self-esteem as two distinct factors (Avancini et al., 2007;Meurer et al., 2012;Sbicigo et al., 2010).Only one study, conducted by Hutz and Zanon (2011), indicated a single-factor structure of the RSES.However, those authors, in analyzing the data via exploratory factor analysis, used criteria without strong robustness (scree plot) to justify retaining only one factor, disregarding more reliable techniques like parallel analysis (Tabachnick & Fidell, 2012), and contrary evidence from their own data (e.g., the values higher than 1 obtained by two factors).In this respect, our study provides adequate empirical evidence to treat the Brazilian version of the RSES as a single-factor measure when controlling for the wording effect of the negative and positive items.
Furthermore, the results described here support the use of correlated traits-correlated uniqueness (CTCU) and correlated traits-correlated methods (CTCM) models to study the wording effect of the items (Marsh et al., 2010).By using these two types of methods to examine the wording effect, we aimed to establish whether the models that include correlated measurement errors or latent factors related to the method effect better fit the data in comparison with models that do not include these methods (Lindwall et al., 2012).The results indicated that Model 4, in which the errors of the negative items were correlated (CTCU), presented slightly better fit indices than those of Model 8, which had two latent factors predicting the effect of negative and positive items (CTCM).Other studies involving the RSES have observed that both the CTCM model, similar to Model 8 (Michaelides et al., 2016;Salerno, Ingoglia & Coco, 2017;Wu et al., 2017), and the CTCU model, similar to Model 4 (DiStefano & Motl, 2006), have the best fit, although the indices of the two are close.Some studies, however, have indicated that CTCU models in general present better fit than CTCM models (Lindwall et al., 2006;Molina et al., 2014), as was observed in this study.A possible explanation is that in allowing the residuals of the items to be correlated, the CTCU models might take into consideration not only the variance related to the wording effect of the items, but also unknown factors (Wu et al., 2017).In counterpart, the CTCM models specify the wording effect of the items explicitly, which allows examining the empirical relations of this effect with relevant measures of attitude or personality (Lindwall et al., 2012).However, a worse fit can be observed when other unknown factors are present, but not specified in the model (Molina et al., 2014).According to Wu et al. (2017), CTCU models are more stable than CTCM models and less susceptible to generating non-convergent solutions in the CFA.However, our Model 5, which is a CTCU model, did not converge to a viable solution because it was unidentified.Similar results have been observed in other studies (Michaelidis et al., 2016;Vasconcelos-Raposo et al., 2012;Supple et al., 2012;Wu et al, 2017), so additional research is necessary to identify the causes behind the problems of non-convergence found in the models.
It should also be noted that both the wording of the negative items (Models 4 and 8) and positive ones (Model 8) is related to the method effect, although the negative items had a more significant impact, given the better fit of Model 4 compared to Model 2. In fact, both the negative and positive items can introduce variation in the measurement that is not associated with the construct being studied (Wu, 2008).The possible sources of the method effect associated with the negative items include careless responses and acquiescence bias (Wu, 2008).The negative items require the respondent engage in a more elaborate cognitive process at the moment of responding to the item, resulting in additional variance in the measurement process (Podsakoff et al., 2012;Weijters, Baumgartner & Schillewaert, 2013).This effect is also related to the positive items, because the presence of the method effect is related to the way the participants interpret the content of the items, which in this case can also be influenced by the modesty bias, commonly observed in countries with collectivist orientation (Wu, 2008).
The invariance analyses indicated that the model that best fit the data (Model 4) was equivalent between men and women at the configurational, metric and partial scalar levels, when the method effects associated with negative items were controlled.This is the first evidence of invariance of the RSES in a Brazilian sample.The equivalence between men and women indicates that the measurement of self-esteem with the Brazilian version of the RSES has the same underlying theoretical structure and psychological significance between men and women, corroborating other findings in the international literature (DiStefano & Motl, 2009;Vasconcelos-Raposo et al., 2012;Michaelidis et al., 2016;Tomás et al., 2013).Nevertheless, when evaluating the scalar invariance, we observed that item 3 (On the whole, I am inclined to believe that I am a failure) was the only one that operated differently between men and women.Similar results were obtained by Vasconcelos-Raposo et al. ( 2012) in a sample of Portuguese men and women.According to the authors, when considering the content of this item, it is possible to suggest that the differences in its scoring between men and women can be the consequence of distinct social and cultural practices underlying gender roles.
Due to the partial scalar invariance between men and women, we compared the latent means and observed that the differences between men and women were not significant.However, the raw scores indicated a marginally significant difference.These results differ from others reported in the literature, in which comparisons of the means of the latent factors have indicated that men have significantly higher levels of self-esteem than women (DiStefano & Motl, 2009;Lindwall et al., 2012;Michaelidis et al., 2016;Tomás et al., 2013).Nevertheless, some studies conducted in Brazil have not indicated significant differences between men and women (Sbicigo et al., 2010;Hutz, Zanon & Vazquez, 2014).This disparity of findings indicates that the differences in self-esteem observed in other studies should not be interpreted as something inherent to gender, but instead as due to contextual factors.
Although the results presented here provide satisfactory evidence for the method effect of the negative and positive items in the factor structure of the RSES and about the gender invariance of this measure, some limitations should be mentioned.Previous studies have indicated that wording effect is associated with the characteristics of the respondents, not with a statistical property of the scale (Wu, 2008).In this sense, the present study has a limitation since we did not evaluate possible dispositional and situational determinants related to the wording effect of the items.For example, certain personality traits are apparently associated with the method effect, such as conscientiousness and neuroticism, which predict, respectively, high scores on negative items and low scores on positive ones (Michaelides et al., 2016).Differences in relation to the wording effect of items were also observed among respondents from collectivist and individualist societies (Wu, 2008).
Another limitation is related to our sample, mostly composed of university students.As proposed in previous studies (DiStefano & Motl, 2006;Wu et al., 2016), the method effects can vary among populations and be more important for certain groups than for others.For example, Lindwall et al. (2016) indicated that negatively worded items can be problematic for groups with relatively low verbal skills, but not for those with higher skills.The absence of evaluation of these dispositional factors and the high variability in the sample hinder comprehension of possible individual differences related to the method effect.In this respect, future studies could evaluate the relationship of dispositional, contextual and sociodemographic factors in the responses to negative and positive items in more diverse Brazilian samples.
Furthermore, the methods used here (CTCU and CTCM), although controlling for the wording effect of the items, do not exclude this effect in the evaluation of the fit of the factor structure.In other words, there is a tendency for methods based on CFA to present a medium to poor fit to the factor structure proposed in which the effect of the items wording is controlled (Maydeu-Olivares & Steenkamp, 2018).Future studies could adopt other modeling strategies, such as random intercept, which permits the exclusion of the variance related to the wording effect of the items, besides enabling controlling for other method effects (such as acquiescence bias), enabling maintaining the proposed factor structure and improving the fit indices.

Final Considerations
In summary, the results obtained indicate the adequacy of a single-factor model for the Rosenberg Self-Esteem Scale when controlling for the effect of the wording of the negative items, suggesting that the RSES is a measure of global self-esteem.The results also indicate that this scale is gender invariant, i.e., the RSES presents the same underlying theoretical structure and the same psychological meaning for women and men.

Figure 1 .
Figure 1.Schematic representations of the models tested.
. For the normalized chi-square metric (χ²/df), values smaller than 5 indicate adequate adjustment of the model, although values smaller than 3 are desirable.Values of the CFI and TLI greater than 0.90 indicate acceptable fit, while values higher than 0.95 indicate good fit.For the RMSEA, values up to 0.08 denote acceptable fit, while values up to 0.06 indicate good fit.Finally, in relation to the indices for comparison between the models (AIC and ECVI), lower values indicate the model has better fit.

Table 1 .
Fit indices of the models and for the measurement invariance test