Factorial validity and reliability of the General Health Questionnaire ( GHQ-12 ) in the Brazilian physician population

The 12-item General Health Questionnaire (GHQ12) is a widely used screening instrument. Oneand two-factor structures have been identified in some countries. In Brazil, the best factor structure is still unclear. This study aimed at knowing its factorial validity and reliability, and testing the one-factor and two-factor models. The participants were 7,512 Brazilian physicians. They answered the GHQ-12 and demographic questions. Unrotated (one-factor) and rotated (two-factor) structures of the GHQ-12 were extracted by principal component analysis. Confirmatory factor analyses (ML) were used to compare the oneand two-factor solutions. The two-factor model fitted the data better than the one-factor one. Those two factors were depression and social dysfunction, and they showed themselves to be directly correlated to one another. They also showed adequate reliability coefficients. The two-factor model is remarkably adequate, showing better fit indices, although it is acceptable to admit a common factor, which could be defined as psychological distress. Mental Health; Physicians; Questionnaire Introduction There has been a considerable increase in the number of people reporting mental symptoms (such as depression or anxiety) that could be confused with organic problems, and therefore be treated erroneously. Although there are diagnostic tools based on the Diagnostic and Statistical Manual of Mental Disorders 1 for instance, the defined criteria for such diagnoses are the presence of symptoms, its continued duration, and the corresponding deficit to psychical functioning. Despite the heuristic and practical nature of these classifications, they take into account the patient’s behaviors and his or her complaints. Such information is often unreliable and diffuse. There is thus a clear need for counting with available objective measures to assess exclusively current mental health symptoms 2,3. The General Health Questionnaire (GHQ) was developed by Goldberg in the 1970s to achieve this goal 4. The original GHQ is composed of 60 items. However, different shortened versions of this instrument are currently available, according to the number of items (e.g., 30, 28, and 12). The GHQ-12, i.e. the 12-item version, due to its brevity, has probably been the most popular. Searching in Google Scholar (http://scholar.google.com. br/scholar?q=ghq-12&hl-pt-BR&lr=, accessed on 05/Jul/2008), and introducing GHQ-12 as a keyword, 4,410 papers were identified. This version is used in many countries and languages 3,5,6,7,8,9. ARTIGO ARTICLE Gouveia VV et al. 1440 Cad. Saúde Pública, Rio de Janeiro, 26(7):1439-1445, jul, 2010 This instrument asks whether the respondent has experienced a particular symptom or behavior recently. Each item is rated on a four-point scale (less than usual, no more than usual, rather more than usual, or much more than usual), using one of two most common scoring methods: dichotomous (0-0-1-1) or Likert type (0-1-2-3). Considering the GHQ-12 to be a brief, simple and easy to complete instrument, and the fact that its application in research settings as a screening tool is well documented 10,11, we decided to check its psychometric properties in a sample of Brazilian physicians. In spite of evidences of validity and reliability of this measure in this cultural milieu 12,13,14, most of them are of an exploratory nature, considering only one state. Moreover, there is not a consensus about the number of factors to extract in the GHQ-12 in Brazil. For instance, Sarriera et al. 13 identified three factors in Rio Grande do Sul, using Principal Components analysis (varimax rotation), and Borges & Argolo 12, in Rio Grande do Norte, found two factors when using Principal Axis Factoring (oblimin rotation). On the other hand, at least in other countries, some researchers often have compared two-factor and one-factor models by confirmatory factor analysis, concluding that the former has a more adequate fit 2,6,15. Nevertheless, one and three-factor models are also compared 2,16,17. Usually, the two-factor solution (depression and social dysfunction) accounts for between 45.3 and 56.5 per cent of the total variance 3,7, presenting an internal consistency that is close to 0.80 12,13. Gouveia et al. 18 tested three factor models (1-, 2-, and 3-factors) by confirmatory factor analyses. They concluded that the most adequate model was the two-factor one, measuring depression and anxiety (social dysfunction), which showed reasonable Cronbach’s alpha coefficients (0.81 and 0.66, respectively). However, their sample was specific to João Pessoa, and considered a medium sized city in the Northeast of Brazil. Oliveira 19, in this same city, took into account a sample of 246 health professionals, including 98 psychologists, 81 physicians and 67 nurses, all of whom answered the GHQ-12. She performed only exploratory factor analysis, without comparing different factor models for this measure. Two factors were observed, explaining 51.5% of the total variance, with alphas of 0.83 (psychological distress) and 0.76 (lack of self-efficacy). There is no information about the fitness of this model to data. These considerations motivated the current study. Its objectives were, therefore, three-fold. Responding to the recommendation to consider a large sample 2, it aimed at (1) knowing the factor structure underlying the 12-item of the GHQ by performing an exploratory factor analysis; (2) testing the two most common factor models to explain the data obtained by this measure, as discussed in the literature (oneand two-factor models); and (3) knowing evidences of its homogeneity and reliability. In sum, this study searched evidences of factorial validity and reliability of the GHQ-12 in a large physician sample from all 26 Brazilian states and Federal District. Physicians demand mental health attention because they are a professional group often working in stressing labor context, and experiencing many mental illness symptoms 19,20.

Factorial validity and reliability of the General Health Questionnaire (GHQ-12) in the Brazilian physician population Questionário de Saúde Geral (QSG-12) na população médica brasileira: evidências de validade fatorial e consistência interna Introduction There has been a considerable increase in the number of people reporting mental symptoms (such as depression or anxiety) that could be confused with organic problems, and therefore be treated erroneously.Although there are diagnostic tools based on the Diagnostic and Statistical Manual of Mental Disorders 1 for instance, the defined criteria for such diagnoses are the presence of symptoms, its continued duration, and the corresponding deficit to psychical functioning.Despite the heuristic and practical nature of these classifications, they take into account the patient's behaviors and his or her complaints.Such information is often unreliable and diffuse.There is thus a clear need for counting with available objective measures to assess exclusively current mental health symptoms 2,3 .The General Health Questionnaire (GHQ) was developed by Goldberg in the 1970s to achieve this goal 4 .
Cad. Saúde Pública, Rio de Janeiro, 26 (7):1439-1445, jul, 2010 This instrument asks whether the respondent has experienced a particular symptom or behavior recently.Each item is rated on a four-point scale (less than usual, no more than usual, rather more than usual, or much more than usual), using one of two most common scoring methods: dichotomous (0-0-1-1) or Likert type (0-1-2-3).
Considering the GHQ-12 to be a brief, simple and easy to complete instrument, and the fact that its application in research settings as a screening tool is well documented 10,11 , we decided to check its psychometric properties in a sample of Brazilian physicians.In spite of evidences of validity and reliability of this measure in this cultural milieu 12,13,14 , most of them are of an exploratory nature, considering only one state.Moreover, there is not a consensus about the number of factors to extract in the GHQ-12 in Brazil.For instance, Sarriera et al. 13 identified three factors in Rio Grande do Sul, using Principal Components analysis (varimax rotation), and Borges & Argolo 12 , in Rio Grande do Norte, found two factors when using Principal Axis Factoring (oblimin rotation).On the other hand, at least in other countries, some researchers often have compared two-factor and one-factor models by confirmatory factor analysis, concluding that the former has a more adequate fit 2,6,15 .Nevertheless, one and three-factor models are also compared 2,16,17 .Usually, the two-factor solution (depression and social dysfunction) accounts for between 45.3 and 56.5 per cent of the total variance 3,7 , presenting an internal consistency that is close to 0.80 12,13 .
Gouveia et al. 18 tested three factor models (1-, 2-, and 3-factors) by confirmatory factor analyses.They concluded that the most adequate model was the two-factor one, measuring depression and anxiety (social dysfunction), which showed reasonable Cronbach's alpha coefficients (0.81 and 0.66, respectively).However, their sample was specific to João Pessoa, and considered a medium sized city in the Northeast of Brazil.Oliveira 19 , in this same city, took into account a sample of 246 health professionals, including 98 psychologists, 81 physicians and 67 nurses, all of whom answered the GHQ-12.She performed only exploratory factor analysis, without comparing different factor models for this measure.Two factors were observed, explaining 51.5% of the total variance, with alphas of 0.83 (psychological distress) and 0.76 (lack of self-efficacy).There is no information about the fitness of this model to data.
These considerations motivated the current study.Its objectives were, therefore, three-fold.Responding to the recommendation to consider a large sample 2 , it aimed at (1) knowing the fac-tor structure underlying the 12-item of the GHQ by performing an exploratory factor analysis; (2) testing the two most common factor models to explain the data obtained by this measure, as discussed in the literature (one-and two-factor models); and (3) knowing evidences of its homogeneity and reliability.In sum, this study searched evidences of factorial validity and reliability of the GHQ-12 in a large physician sample from all 26 Brazilian states and Federal District.Physicians demand mental health attention because they are a professional group often working in stressing labor context, and experiencing many mental illness symptoms 19,20 .

Method Participants
A national mail survey was carried from December 2005 to August 2006.Taking into account the Brazilian physician population at that time (281,939), we randomly selected 67,468 of them to whom were sent the questionnaires.The response rate was of 11.8% (7,700), which was consistent with previous studies in this cultural milieu 21 .Participating effectively in this study were 7,512 physicians, who answered all 12 items of the GHQ.Most of them were male (63.1%), married (75.7%), and had children (78.1%).Their mean age was 47.2 years old (standard deviations -SD = 11.28,ranging from 24 to 93; 95.1% under 65 years-old).The detailed method is described elsewhere 22 .

Instrument
All participants answered a questionnaire comprised of different psychological measures (e.g., fatigue, suicidal ideation), and demographic questions.The Brazilian-Portuguese version of the GHQ-12 18 was also included.The Likert type answer scale was adopted, and is described above.Psychometric properties of this measure in Brazil and other cultures were previously detailed.It is only this measure which receives attention in this article.

Procedure
The dataset of Brazilian physicians was requested from the Federal Council of Medicine, Brazil.Taking into account this dataset, potential participants for this study were selected.To each selected physician, a questionnaire was sent on one double-sided sheet, and his/her voluntary and anonymous participation in the study was requested.All participants were informed that filling and returning the questionnaire was considered as acceptance of the term of free and informed consent.

Statistical analyses
The reliability of the measures was examined in relation to the instrument's internal consistency (Cronbach's alpha coefficients) and homogeneity (mean inter-item correlations).Cronbach's alpha coefficients of 0.70 or higher and mean inter-item correlations in the 0.20 to 0.40 range were deemed to indicate good reliability 23 .Exploratory factor analysis was performed using principal components, and confirmatory factor analysis was carried using Analysis of Moments Structures 16 th revision (AMOS) and maximumlikelihood estimation procedures, taking the observed covariance matrix as the input.This procedure has been used in previous studies 2 .The degree to which the data fit the confirmatory models was assessed using the adjusted goodness-of-fit-index (AGFI), the comparative fit index (CFI), and the root mean square error of approximation (RMSEA).Models with AGFI and CFI values close to .90 or higher, and RMSEA of .08 or lower indicate acceptable fit 24,25 .
The alternative factor models of the GHQ-12 were assessed with respect to three fit indices.Specifically, the χ 2 difference test (Dc²), the expected cross-validation index (ECVI) and the consistent Akaike information criterion (CAIC) were used to calculate improvements over competing models.Significant results for the χ 2 difference test in favor of lower value, and lower ECVI and CAIC values reflect better fit 26 .

Descriptive statistics
Table 1 presents the means (m), SD, and correlations between items of the GHQ-12.The items' means ranging from 1.23 (Item 11: Been thinking of yourself as a worthless person?) to 2.38 (Item 7: Been able to enjoy your normal day-to-day activities?).The mean inter-item correlation of the total set of items was 0.42 (p < 0.001), ranging from 0.22 (p < 0.001; items 2 and 4) to 0.65 (p < 0.001; items 5 and 9).

Factor structure
Initially, before performing the factor analysis, the adequacy of the correlation matrix of the 12item GHQ was checked (Table 1).The observed values supported this type of statistical analysis: KMO = 0.93 and Bartlett's Sphericity Test, c² (66) 38,705.09,p < 0.001.Thus, it sounds appropriate to conduct the exploratory factor analysis.In this case, two steps followed: first, the unrotated solutions were produced, and then the varimax rotation, admitting orthogonal factors.Independently, it was adopted a strict cutoff of factor loading of > 0.50, used by other researchers 27 .Results are described below.

Unrotated factor structure
Factor analysis was carried using principal component (PC) analysis, in line with previous research 3 .This analysis, without fixing the number of factors to extract, allowed identifying two factors with eigenvalue (Kaiser's criterion) greater than 1 (5.67 and 1.18), conjointly accounting for 57.1% of the total variance.This solution clearly produced a general unipolar factor, all items with positive loadings > 0.50, ranging from 0.57 (Item 3. Felt that you are playing a useful part in things?) to 0.81 (Item 9: Been feeling unhappy and depressed?).The second factor aggregated items with lower factor loadings; only item 3 attained the factor loading cutoff.In this way, this factor was discarded at this moment.
The number of factors to extract was based on three criteria: the eigenvalue greater than 1 (Kaiser), the scree test (Cattell), and the parallel analysis (Horn).According to the first two criteria, the extraction of two factors seems evident.The final decision of extracting these factors was obtained by parallel analysis, where only two first observed eigenvalues were higher than those that would be obtained from 1,000 replications of random data with the same number of items and the same sample size 28 .

Varimax rotation structure
Fixing the criterion to extract two factors with varimax rotation, trying to identify a simple structure, the PC analysis reveals a clear solution.According to the first column of Table 2, the 12 items were equally distributed into two principal factors.The eigenvalues after the rotation were 3.82 and 3.03, explaining conjointly 57% of the total variance.The cutoff to define the item as representing the factor was factor loading > 0.50.The items loadings on the first factor (e.g., constantly under strain, lost sleep over worry, and unhappy or depressed) seem to evince the construct depression, meanwhile those loadings on the second factor (e.g., play useful part in things, capable of making decisions, and thinking of self as worthless) express the social dysfunction construct.These two factors were positively correlated to each other (r = 0.69, p < 0.001).

Testing uni-and two-factor models
Previously, the uni-and multivariate distribution of items was checked.Taking into account the absolute values of the univariate skewness for large samples the ML (maximum likelihood) estimation is more adequate 30 , and that on this condition the sampling error's impact is minimized 31 .In line with these recommendations, the ML estimation was adopted.
Considering the possibility of two factor solutions, i.e. one-factor and two-factor models, according to the literature, we decided to know their fit to data and compare them to each other.In the first model (M 1 ), all 12 items were defined to load on only one factor; in the second model (M 2 ), the items were established loading on two factors (Ф = 0.15), according to the extracted solution in previous PC analysis.In both models, all items loadings were statistically different from zero (z > 1.96, p < 0.05).Fit indices were AGFI = 0.84, CFI = 0.88, and RMSEA = 0.11 (confidence interval 90% -CI90%: 0.103-0.108)for M 1 , and AGFI = 0.90, CFI = 0.92, and RMSEA = 0.088 (CI90%: 0.086-0.091)for M 2 .Comparing these nested models, the latter revealed to be more adequate than the former [Dc² (1) = 1,431.93,p < 0.001].The respective values of CAIC and ECVI for M 1 (4,818.69and 0.616) and M 1 (3,396.68and 0.426) support this finding.

Discussion
The current study aimed at knowing evidences of factorial validity and reliability of the GHQ-12 in the Brazilian physician population.It considered a large sample, corresponding to approximately 3% of these professionals in Brazil, including participants of all its 26 states and the Federal District.Despite this effort, it is important to observe that the aim was not to generalize the findings to the whole country.This study had a psychometric nature, assessing measure parameters of a specific instrument.Perhaps its main limitation was not testing the impact of different demographic variables (state, gender) on the factorial structure of the GHQ-12.However, this was not its objective.One methodological aspect might demand attention in the future: the use of ML estimation with non-normality item distribution.In line with Ory & Mokhtarian 30 , it could be interesting to explore different procedures of estimation (e.g., ML, ADF, bootstrapping).
As previously mentioned, only one study was found in Brazil in which the factor structure of the GHQ-12 was assessed among physician participants 19 .However, that study takes into account a small sample, performing an exploratory factor analysis.Therefore, our study improves it, running exploratory and confirmatory factor analyses, testing two common factor models for this measure (i.e., uni-and bi-factor 5,7,15,18 ), and presenting information about homogeneity and reliability of the corresponding factors.
The findings reported in the current study support the psychometric appropriateness of the GHQ-12.For instance, our findings were consistent with those described by Werneke et al. 3 When an unrotated solution was defined, a one-factor model seemed most adequate, which could be named as psychological distress.However, fixing the varimax rotation, it was possible to find two factors, which explained close to 60% of the total variance.Although these factors were named in a different way in previous studies in Brazil 12,18 , most items comprising each factor observed in this study clearly reproduce the observed most common two-factor model (depression and social dysfunction) identified in other studies 2,3 .In line with previous findings, in this study the two-factor model was better than the one-factor model 15,18 .Overall, the goodness-offit indexes for the two-factor one were acceptable 24,25 , and coherent with those observed in the literature 2 .Finally, the homogeneity and reliability were higher than the cutoff recommended 23 .For instance, Cronbach's alpha was always higher than 0.80.
Future studies should test the factor invariance of the two-factor model of the GHQ-12.For instance, this test could consider demographic (e.g., gender, age) and sociocultural (e.g., ethnic group, regional culture) variables.It would also be important to examine the criterion-related validity of the GHQ-12, considering some relevant psychiatric symptoms (e.g., suicidal ideation, negative affects) and indicators of work-related stress (e.g., fatigue, burnout).Finally, it would be recommended to establish the sensitivity and specificity of this measure.In this case, it could take into account as gold standard the classification of formal psychiatric diagnoses of depression or anxiety experienced by physicians.

Resumo
ContributorsV.V. Gouveia was responsible for designing the research, data analysis and writing the article.G. A. Barbosa participated in the discussion about the data used in the article.E. O. Andrade made a significant contribution towards the conception and scope of the study.M. B. Carneiro participated in the analysis and interpretation of the data.

Table 2
Exploratory and confi rmatory factorial structure of the 12-item General Health Questionnaire (GHQ).