Measurement of depression in the Brazilian population: validation of the Patient Health Questionnaire (PHQ-8)

We aimed to evaluate the psychometric properties of the Brazilian version of the Patient Health Questionnaire (PHQ-8). A study with a sample of 4,170 individuals ( ≥ 15 years old) from the urban area. Conglomerate sampling was adopted in two stages (census sectors and streets), with weighting of estimates by sample weights. A structured questionnaire with sociodemographic data, the PHQ – the modules for depression, generalized anxiety disorder and panic disorder – and the Self-Reporting Questionnaire (SRQ-20) were used. In the evaluation of the PHQ-8, we verified the construct validity by analyzing the dimensional structure, convergent validity and internal consistency. We found a linear disorder without losses to maintain the four response categories. The factor analysis found unidimensionality of the depression construct, with strong factor loads, low residual variances, low residual correlation between items, good fit of the model, internal consistency and satisfactory convergent factorial validity (high loads and correlations with other tests/scales of similar constructs). The PHQ-8 has a one-dimensional structure with evidence of good validity and reliability, being suitable for use in the Brazilian population.


Introduction
Depression is a highly prevalent disease worldwide 1,2 , with increasing trend in recent years 3 . Brazil is among the five countries with the highest rates in the world. Depression is a treatable mental disorder characterized by the presence of affective symptoms such as sadness, empty or irritable mood, accompanied or not by somatic and cognitive changes, which affect the individual's functional ability 5 . Early identification and treatment improve the prognosis and are essential to cope with this condition and to prevent the occurrence of new episodes 6 .
Brazilian studies identifying mental disorders in general and depression are scarce. The few existing studies focus on groups with depression 4,7 . Thus, broader analyses with active evaluation of depression in apparently healthy populations are incipient, and information about the real dimension of mental disorders in the general population is scarce. This gap in the Brazilian context lacks data, but a study in the United States showed that less than 5% of adults are screened for depression in primary health care services 8 , which is worsened by the lack of instruments with good psychometric performance and suitable for screening and diagnosis 9,10 .
The difficulties in screening and diagnosis hinder the monitoring of mental disorders, although several evaluation instruments are available 11 . Incomplete assessments of the performance of these instruments or weaknesses and gaps observed in their assessments constitute barriers to the effective incorporation of health care routines and clinical practices in general, making it impossible to advance in the early diagnosis and intervention of the disease and the consequent reduction in health expenditures.
The improvement in access to mental health services and the routine use of related diagnostic tools that are reliable, easy to handle and apply, with good performance, are challenges in this field 8 . Investing efforts in the assessment of instruments that can increase safety in their use is essential.
The scientific literature describes at least 16 instruments for detection of depression used in primary health care 11  The PHQ is used worldwide in many contexts and translated into several languages 8,12,13,14 . An instrument of easy and quick application, with less items and satisfactory estimates of sensitivity and specificity 13,15 .
The PHQ is a short, self-administered version of PRIME-MD 16 , based on criteria for major depressive disorder described in the Diagnostic and Statistical Manual of Mental Disorders (DSM), organized in modules for measuring specific psychological disorders. The module depression assessment originally included nine items (PHQ-9), but subsequent assessments recommended its reduction to an eight-item version, PHQ-8 12 .
To justify the exclusion of the ninth item proposed in the PHQ-9, "Thinking about getting hurt in some way or that it would be better to be dead", experts suggest that this change does not influence the diagnosis of the condition, given that: (a) thought of self-mutilation is the last item to be endorsed in the diagnosis of depression because they are uncommon in the general population 10,12,17 ; (b) in the vvalidation studies of the PHQ-9, the item does not accurately discriminate the presence or absence of depression, presenting a lower factor loading 18,19 , sometimes with cross-loadings on other constructs, such as anxiety 20 ; (c) its omission does not significantly alter the instrument's sensitivity and specificity 12,15,21,22,23 , nor the reliability indicators 18,19 , with evidence that the measures produced by the 8-or 9-item version are similar, with high correlation indicators between them 13,18,21,22 .
Besides, the exclusion of this ninth item, related to suicidal ideation, has been suggested in population-based epidemiological studies holding low risk of severe suicidal ideation, that is, in populations with no indication of previous mental illness. Given the rather sensitive nature of the item, its exclusion has also been indicated in contexts of insufficient financial resources, where mental health specialists are likely to be unavailable 12 . These issues strengthen the relevance of using the PHQ-8 to measure depression in the Brazilian context in population-based epidemiological studies.
Despite the metric qualities of PHQ-8 in the international context, its psychometric properties have not yet been evaluated in the Brazilian context. Valid and reliable instruments are necessary for screening and clinical confirmation and can be used in the routines of health care services and in studies with large samples. Thus, this study aimed to evaluate the psychometric properties of the Brazilian version of the PHQ-8 in an urban population.

Study design and sampling
This study is an offshoot of the project Surveillance in Mental Health and Work: a Cohort of the Population of Feira de Santana, Bahia, Brazil steered by the Epidemiology Center (NEPI) of the State University of Feira de Santana (UEFS).
Data concerns a probabilistic sample of a population aged ≥ 15 years, living in urban area of Feira de Santana in 2007, based on face-to-face interviews in which a structured questionnaire was applied. In the original study, with the aim of estimating the prevalence of mental disorders in urban populations, a sample size of 1,868 participants was estimated, considering the urban population aged ≥ 15 years (N = 422,282), prevalence of common mental disorders (CMD) of 24%35, 95% confidence interval (95%CI), 3% accuracy, design effect study 2 and an increase of 20% predicting possible refusals and losses. A total of 4,170 individuals were interviewed, enabling the estimation of a model with up to 83 parameters (50 observations per parameter).
Participants were selected through a complex sampling procedure comprising two clustering stages (census tracts and streets) and stratified according to subdistricts. The details of these procedures are found elsewhere 36 . The sample weight was estimated to compensate for different selection probabilities at each sampling stage, considering different weightings for each element of the sample.

Data collection instruments
A structured questionnaire containing eight blocks of questions was used in data collection. Information on sociodemographic, economic, and mental health assessment characteristics was used for this study. The questionnaire was applied at the participants' household by a trained team after the participant signed the informed consent form.
The Portuguese adapted version of PHQ-8 was used to evaluate depression. Depressive symptoms are assessed considering the two-week recall (Box 1).
The literature 37,38 and previous application of the instrument indicated that the temporal gradient of the response categories of the original version was unable to elucidate the respondent's adequate positioning to the measured item (e.g., "several days" is similar to "more than half the days"). The research team organized workshops with mental health and psychometrics experts to discuss and propose changes. The changes aimed to discriminate the frequency of events, respecting the meanings of the original proposition. Thus, we changed the response category "1" to "a few days." In this way, the response categories used were: "0" (none); "1" (a few days), "2" (more than half the days); "3" (almost every day). The pre-test, conducted in the first applications of the instrument, showed a better understanding of the response categories and adequate distinction of the frequency of the event.
Besides PHQ-8, other mental health outcomes were assessed to offer parameters for comparison with depression results. CMD were measured by the Self-Reporting Questionnaire (SRQ-20), validated for the Brazilian context, presenting satisfactory performance indicators 39 . This instrument consists of 20 items with dichotomous response categories. A cut-off point of five or more positive responses is adopted for men, and seven or more positive responses for women, for screening for suspicion of CMD 40  Slowness to move or talk to the point that other people notice, or the opposite, you have been so agitated or restless that you have been moving around a lot more than usual.

Feeling slow or restless
Anxiety disorders were assessed using the 22-item PHQ module, a scale with adequate psychometric performance in other populations 9 . The Brazilian version of the instrument is available free of charge (https://www.phqscreeners.com), which was previously evaluated to ensure its proper use in this population (data not shown in tables). This scale evaluates two types of specific disorders: (a) Panic disorder: a total of 15 items that assess anxiety and panic attack symptoms with dichotomous response categories (yes or no). Panic disorder is diagnosed in case of positive responses to the four items of anxiety symptoms (presence of anxiety crisis or recurring and unexpected panic in the last month; apprehension or persistent concern about a new attack), associated with at least four of the 11 panic attack symptoms (severe anxiety crisis) 5,13 . The scale showed a one-dimensional structure with factor loadings > 0.58, with adequate adjustment (comparative fit index -CFI and Tucker-Lewis index -TLI > 0.95 and root mean square error of aproximation -RMSEA < 0.05) and adequate reliability (CC = 0.90). (b) Generalized anxiety disorder: measured by seven items that assess the individual's frequency of anxiety-related disturbances, considering a four-week recall, with three-point Likert-type response categories: (0) "none", (1) "several days", and (2) "more than half the days". The generalized anxiety disorder is identified by the presence of four or more "more than half the days" items and a positive response to "feeling nervous, anxious, tense or very worried" 13 . This module also showed a one-dimensional structure, with loads > 0.47, with adequate fit indices (CFI and TLI > 0.95 and RMSEA = 0.05) and satisfactory reliability (CC = 0.77).

Data analysis
Initially, descriptive statistics, absolute and relative frequencies were used to characterize the sociodemographic profile of the PHQ-8 respondents. The prevalence of suicidal ideation (item 11 of SRQ-20 "thought of ending one's life") was also estimated to confirm the low frequency of this item in the studied population. The SPSS software, version 24.0 (https://www.ibm.com/), was used.
Descriptive analysis was also used to assess the distribution of the response categories of the items in the PHQ-8, using the Mplus software, version 8.4 (https://www.statmodel.com/).
An exploratory analysis was initially implemented in the assessment of the configural structure. Eigenvalues were estimated as criteria for extracting factor quantity 41 . Then, exploratory structural equation models (ESEM) 42 was specified to validate the one-dimensional structure indicated by the eigenvalues. Geomin oblique rotation was adopted.
Cad. Saúde Pública 2022; 38(6):e00176421 Then, confirmatory factor analysis (CFA) was employed to evaluate the identified solution 43,44 . These analyses followed the recommendations of the COnsensus-based Standards for the Selection of Health Measurement INstruments (COSMIN) 45 , as well as other prominent references in the area 41,43,44,46,47 .
The item was considered conditionally related to a specific factor when its standardized loading was ≥ 0.5, with a residual ≤ 0.7 46,47 . Residual correlations were evaluated to check possible semantic redundancies 44 : values between 0.3 and 0.6 suggested reassessment to aggregate their semantic contents; and values ≥ 0.7 indicated the need to remove one of the items 47 .
The robust diagonally weighted least squares estimator (weighted least squares mean and variance adjusted -WLSMV) was used, considering that this is an appropriate estimator for polychotomous and ordinal items 47,48 . Model fit was evaluated by the RMSEA, CFI and TLI.
RMSEA < 0.06 suggest a good fit, while values > 0.10 indicate inadequate fit and that the model should be rejected 44 . CFI and TLI compare the target model with a null model. Both vary from zero to one, when ≥ 0.95 indicate an acceptable fit 43 . All analysis accounted for the complex sampling design.
The modification indices (MI) were evaluated for cross-load diagnostics and residual correlations. They indicate how much the model adjustment would improve if a parameter was freely estimated. Expected parameter changes (EPC) were observed to evaluate the direction and intensity of the estimates with the suggested modification 44,47 . Thus, a detailed assessment of the residual correlation and/or re-specification of the model was conducted after identifying MI ≥ 10 and EPC ≥ 0.25 44 .
Convergent factor validity was assessed by: (a) inspection of high factor loading and (b) average variance extracted (AVE) ≥ 0.50 46,47 . External correlation of the construct "depression" with other instruments that measure theoretically related mental health outcomes was checked 45,49 . Correlations were obtained using Spearman's rank correlation test, due to the lack of normality of the scores (Shapiro-Wilk test), with a criterion of statistical significance of p ≤ 0.05. The analysis also used Stata 15.0 (https://www.stata.com).
Internal consistency was assessed using composite reliability (CR) with values ≥ 0.70 as a criterion for good consistency 41,44,46 . The 95%CI of the CR and AVE were obtained by the bootstrap method with 1,000 replications. This research was approved by the UEFS Ethics Research Committee, under opinion 2,420,653 (CAAE: 74792617.4.0000.0053).

Results
The study sample consisted of 4,170 people living in the urban area of the studied municipality, mainly women (67.6%), aged up to 40 years (58.2%), with a mean age of 38.94 years (SD = 17.9), black race/skin color (80.7%), single (51.4%) and with up to incomplete elementary school (44.3%). Regarding insertion in the labor market, 59.1% were unemployed and 52.6% had an income of up to one minimum wage ( Table 1).
The sample showed low frequency of suicidal ideation. A total of 5.3% indicated "thought of ending one's life" in the last 30 days (data not shown in the table), assessed by the SRQ-20.
The distribution of frequencies in the PHQ-8 response categories showed higher response frequencies in the "none" category (63.4% in p4 to 84.3% in p6) and lower frequencies in the category "more than half the days" (2.5% p6 to 7.4% p3). Items p3 and p5 showed the highest frequency in "almost every day"; on the other hand, item p6 showed the lowest frequency in this category ( Table 2).
The results of the EFA showed a dominant dimension, with a great decrease between the first (4.283) and the second (0.709) eigenvalues and small decreases thereafter. Although two models (onedimensional and two-dimensional) demonstrated an adequate fit, the presence of cross loads between the factors and the diagnosis of the eigenvalues suggest unidimensionality (data not shown in the table). ESEM confirmed the structure (Table 3).
In the CFA, the one-dimensional model of the PHQ-8 showed satisfactory adjustment (CFI = 0.98; TLI = 0.98; RMSEA = 0.03; 90%CI: 0.02-0.03) and internal consistency (CR = 0.88) ( Table 3). All factor loadings were higher than 0.50, with lowest loading in items p5 and p8 (λi = 0.61) and highest load-  ing in item p2 (λi = 0.82), besides low residual variance values (δi < 0.70), with the maximum residual identified in items p5 and p8 (δi = 0.63) ( Table 3). The model's diagnosis by MI assessment showed a residual correlation between items p2 (feeling sad, depressed, or hopeless) and p6 (feeling bad about yourself; thinking that you are a failure, or that you were disappointing yourself or your family) (MEP = 0.292). The free estimation of this parameter demonstrated a low residual correlation (0.25) between the items and an insignificant improvement in the adjustment of the model (∆CFI = 0.004) ( Table 3), which indicate the absence of overlap in the content and the decision to keep them in the model. MI did not indicate other changes.
The positive and significant relationships (p < 0.001) of the construct measured by PHQ-8 (depression) with other similar constructs corroborated the convergent validity. CMD showed a strong correlation (r = 0.592), moderate correlation with panic disorder (r = 0.326), and generalized anxiety disorder (r = 0.274) (

Discussion
The psychometric properties of the Brazilian version of PHQ-8 in the general population demonstrated that the instrument can measure the depression construct in the studied context, showing adequate configurational and metric structures (dimensional validity), as well as connection between the obtained construct and similar tests/scales (convergent validity). The findings evidence the reliability of the items. This study evaluated the psychometric properties of populations living in an urban area in a Brazilian municipality, with a predominance of women, young people, black people, without a partner, with low schooling level and low income. Sociodemographic characteristics similar to the Brazilian population, which registers a predominance of women (52.3%), aged up to 39 years (60%), black (54%), without a partner (61%), < 15 years of study (90%) and with an income below two minimum wages (70.8%) 50 . Therefore, the general profile of this study sample is similar to that of the Brazilian population.

Dimensional validity
The dimensional validity of the Brazilian version of PHQ-8 endorsed the structure of a single factor 27,33,34 and reinforced depression as a single construct (DSM-5, 2014). This indicates that the adaptation made in the semantics of response category "1" did not change the construct's behavior. Since all four response options on the scale were used, they may represent the respondents' frequency of depressive symptoms.
Studies that indicate the unidimensionality of the PHQ-8 have been conducted in specific populations, such as outpatients of a public hospital in Bolivia 33 , university students from Latin America 27 and Mexican and Central American adults 34 . In studies in which the adjustment of models with two factors was satisfactory, these proved to be highly correlated 27,34 , indicating unidimensionality or a higher order factor.
The exploratory analysis in this study showed that the obtained eigenvalues corroborated findings of extraction of a single factor. A unidimensionality sustained by the presence of satisfactory adjust-   ment parameters and the absence of residual correlations. The single factor model showed excellent rates of absolute and incremental adjustment 43,46 , which indicate reliable and discriminating items.
Regarding the presence of residual correlation, although the MIs suggested overlapping content of depressed mood symptoms (p2) and feelings of guilt or worthlessness (p6), which may be conceptually related, the residual correlation estimate was < 0.3, which does not indicate overlap/redundancy of content or need to reevaluate the items 47 . This finding corroborated the conditional independence of the instrument's items.

Convergent factorial validity
The study of the factorial loads supported the evaluation of the convergent factorial validity of the eight items of the PHQ for measuring depression. The high loads observed in all items indicated that they converge regarding depression. The item "Do you feel sad, depressed, or hopeless" (depressed mood) directed the construct the most. This is expected due to the concept of depression as a mood disorder, in which the main characteristic is the presence of sad, empty, or irritable moods 5 .
However, the AVE is borderline, and therefore, reservedly admissible 46 . Considering the upper limit of the 95%CI, the studied general population showed that 51% of the latent trait of depression can be mapped by the 8 items of PHQ-8. Factorial validity values close to our results also identified in the general German (variance = 0.50) 51 and Taiwanese and Chinese (variance = 0.42) 6 populations, using PHQ-9.
These findings suggest that the suppression of the ninth item of the instrument did not alter the convergent factorial validity in general populations. We emphasize the lack of international or national studies of performance evaluation of the PHQ-8, thus requiring future studies to reinforce the results and interpretations.
The hypothesis testing confirmed the external correlation and endorsed the convergent factorial validity of PHQ-8 in general populations 49,52 , which assessed the correlation with other similar constructs of mental illness analysis. Positive correlations with the scores of instruments that assessed CMD, generalized anxiety disorder, and panic disorder (PHQ sessions) corroborated the convergent validity of PHQ-8. Studies showed similar results with undergraduate students from the United States 27 , immigrants from Mexico and Central America 34 and individuals from a psychiatric department of a university hospital in the Republic of Korea 23 ; the latter with positive correlations with the depression diagnosis validity scale -the Hamilton Depression Rating Scale (HAMD). We did not find studies using PHQ-8 in a general population that evaluated this psychometric property.

Reliability of items
The CR identifies the level of interrelation between the instrument's items 45 , which evidences the homogeneous measurement of a common characteristic 49,53 , in this case, the depression construct.
The CR indicator is based on the factorial loads. Since the items' factorial loads can vary, this is presented as a more robust parameter for reliability analysis than Cronbach's alpha 54 . The PHQ-8 estimate of CR (0.88) indicated satisfactory homogeneity of the items, which consistently represented depression.
The Cronbach's alpha is relatively more fragile and strongly influenced by the reduced number of items in the PHQ-8 49 . In its psychometric history, we identified the reliability of the items in international studies in specific populations: patients with chronic heart failure 29 , outpatients 33 and individuals who visited the psychiatric department of a university hospital in the Republic of Korea 23 .
Similar to other studies 23,32 , the removal of item 9, of suicidal ideation, did not change the internal consistency of the scale.

Limitations
We used robust methods to analyze the psychometric properties of the PHQ-8 in a representative sample of the general population of a medium-sized Brazilian municipality (largest city in the Northeast region of the country). The results showed satisfactory validity and reliability of the PHQ-8 for use in the Brazilian scenario. However, some limitations must be considered.
The data used for the analysis are from a study collected in 2007, which may represent a specific moment marked by exposures to depression different from those in the Brazilian population nowadays. However, the temporality of these data can be relativized, especially due to the relevance of psychic morbidity in the populations' general context of illness, with a continuous growth of mental disorders in the last 10 years in Brazil, mainly depression 55,56 . Besides, evidence shows that situations of vulnerability have increased in that time, including situations related to the populations' sociode-Cad. Saúde Pública 2022; 38(6):e00176421 mographic conditions. Thus, the context in which this instrument was applied, which constituted the empirical basis of this study, did not show significant changes that influence or invalidate our results.
Besides the temporal aspect, caution and considerations are necessary to analyze our results due to the Brazilian states' diversity and their significant regional differences, and because our results refer to the investigation of a single Brazilian location.
We used a cluster-sampling method in which the observations are not independent 57,58 , that is, the neighborhood or homogeneity effect cannot be ruled out. We tried to correct this effect with weighting analysis of the sample weights -a procedure to compensate for the different selection probabilities at each stage of the conglomerate. The increase in the size of the studied sample was another resource used to expand the sample heterogeneity and the possibilities of inclusion of population groups.
We used PHQ-9 performance studies for part of the comparisons of the obtained results. Although the results reveal that the suppression of the ninth item did not change the psychometric properties of the PHQ to measure depression, further studies with the PHQ-8 are essential to consolidate the properties identified in this study and to advance the knowledge of its psychometric properties, example of the evaluation of the dimensional structure and measurement invariance between groups 47 . The instrument's performance must be stable in different population groups to enable more reliable comparisons. Thus, the use of Confirmatory Factor Analysis of Multiple Groups is suggested to identify aspects that may interfere in the assessment of the construct, such as aspects related to gender and work situation.

Final considerations
Despite the limitations, the findings suggest that the Brazilian version of the PHQ-8 has a onedimensional structure, good validity and reliability, being useful and effective in contexts of research and mental health care. Based on the scientific literature and the results of this study, we concluded that the PHQ-8 evidences validity that allows suggesting its use in the general Brazilian population, but cautiously due to the study being conducted in a single location.
Thus, the results strengthen its usefulness for mental health care in Brazil, since it has psychometric properties able to diagnose depression in the general population, freely available, with few items and easy to apply, analyze and interpret. Therefore, the PHQ-8 allows the early diagnosis of the disease, with the possibility of interventions at the beginning of the illness, with greater effectiveness, and can be used in the routine of health care services at its different levels of care, especially in primary health care.