SciELO - Scientific Electronic Library Online

vol.61 issue4Term structure of sovereign spreads: a contingent claim model author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Revista Brasileira de Economia

Print version ISSN 0034-7140On-line version ISSN 1806-9134

Rev. Bras. Econ. vol.61 no.4 Rio de Janeiro Oct./Dec. 2007 

Using normalized equations to solve the indetermination problem in the Oaxaca-Blinder decomposition: an application to the gender wage gap in Brazil



Luiz Guilherme Scorzafave; Elaine Toldo Pazello

Ribeirão Preto School of Economics, Administration and Accounting of the University of São Paulo (FEA-RP/USP)




There are hundreds of works that implement the Oaxaca-Blinder decomposition. However, this decomposition is not invariant to the choice of reference group when dummy variables are used. This paper applies the solution proposed by Yun (005a,b) for this identification problem to Brazilian gender wage gap estimation. Our principal finding is the increasing difference in part-time work coefficients between men and women, which contributes to narrow the gender wage gap. Other studies in Brazil not using any correction of the identification problem have found different results.

Keywords: Oaxaca decomposition, woman, gender, wage, normalized regression.
JEL Code: J16, J7, J31.


Há centenas de trabalhos que implementam a decomposição de Oaxaca-Blinder. Entretanto, esta decomposição não é invariante à escolha dos grupos de referência quando variáveis binárias são utilizadas como regressores. Este artigo aplica a solução proposta por Yun (005a,b) para este problema de identificação à estimação do diferencial de salários por sexo no Brasil. A crescente diferença entre homens e mulheres no coeficiente da regressão associado ao trabalho em meio período vem contribuindo para reduzir o diferencial de salários por sexo. Outros estudos já realizados no Brasil que não utilizaram qualquer correção do problema de identificação, encontraram resultados diferentes.




The decomposition proposed by Oaxaca (1973) and Blinder (1973) has been applied in hundreds of studies around the world, including Brazil.1 In general, the methodology is applied to separate the wage differentials of distinct groups (men/women, white/non-white) in two components, one related to differences in observable characteristics of the two groups. For example, men could earn more because they have more experience or are more educated than women. Nevertheless, this part, called the ''explained differential'', is responsible for only a fraction of the wage gap between the groups. The remaining gap, called the ''unexplained differential'', is attributable to different returns to the characteristics between the two groups. For instance, men and women with the same level of education could receive different rewards.

Some authors attribute this unexplained gap to discrimination, but there is a controversy in the literature over whether this conclusion is valid. The argument against this idea is that one can only say that a differential is attributable to discrimination if the estimation has considered all variables that affect wages and are different between groups. Obviously, it is hard to believe that any regression specification can assure this.

Apart from this debate, the Oaxaca-Blinder decomposition can be used to assess the contribution of each variable to the explained and unexplained gap. However, Oaxaca and Ramson (1999) show that the differential share attributable to the model's dummy variables depends on the choice of the reference group. On the other hand, the overall gap fraction due to ''explained'' and ''unexplained'' is not affected by this problem.

Additionally, Yun (005a,b) proposes a method to implement the Oaxaca-Blinder decomposition that solves this indetermination problem. He estimates ''normalized'' equations, imposing the restriction that the sum of the dummies' coefficients have to be zero.

In the Brazilian case, there are no papers that consider this indetermination related to dummy variables. So, the aim of this paper is to present the indetermination problem and the solution proposed by Yun (005b) and apply it to the Brazilian case. To do this, in the next section we show the indetermination problem and the solution proposed by Yun (005b). Next, we apply this solution to three years in Brazil: 1988, 1996 and 2004. The choice of these years particularly allows a comparison with the results of Giuberti and Menezes-Filho (2005), who used 1988 and 1996 in their analysis.



2.1. The problem

We present the identification problem in the Oaxaca-Blinder decomposition based on Oaxaca and Ramson (1999). Suppose that one estimates separate wage regressions for males and females. Suppose there is a set of dummy variables (d's) and also L continuous variables (z's) in the model, where dij = 1, and {i = m, f}. For instance, d could be region of residence. In country-regionBrazil, J=5 (South, Southeast, North, Northeast, Midwest). Without loss of generality, di1 will be the omitted category. The separately estimated wage equations for type i individuals at the sample means are given by:

where is the mean log wage, is the estimated intercept, is the estimated coefficient for the dummy variable dij; is a vector of estimated slope coefficients for the set of regressors comprising the lth variable, is a vector of regressor means for the set of regressors comprising the lth variable, = - , and finally, = .

Still following Oaxaca and Ramson (1999), the mean wage differences can be decomposed as follows:

where the last two terms in each equality measure the ''endowment'' effect, while the others capture the ''discrimination'' effect.

The first thing that Oaxaca and Ramson (1999, 156) indicate about this decomposition is that ''the estimated overall discrimination and the estimated overall endowment effect are invariant to the choice of left-out reference group and the suppression of the constant term in the absence of a left-out reference group.'' The most important issue, however, is that the contribution of variable d to the discrimination effect is sensitive to the left-out reference group because the intercept varies with changes in that reference group. To see this, suppose that the last dummy variable has been chosen for the left-out reference group. In this case, the ''discrimination'' effect would be , where = - .

Oaxaca and Ramson (1999) show that if there is only one set of dummy variables in the regression, this problem can be solved by incorporating (m0 - f0 ) to the contribution of variable d to the ''discrimination'' effect. However, this solution is not valid if there is more than one set of dummy variables in the estimated equation, a very common situation in the context of wage regressions.

2.2. The solution

Yun (005a) proposes a methodology to disentangle the identification problem, based on normalized regressions. The idea concerning normalized regressions is that if

''alternative reference groups yield different estimates of the (... ) coefficients effect for each individual variable, then it is natural to obtain estimates of the (...) effect for every possible specification of the reference groups and take the average of the estimates of the (...) effect with various reference groups as the 'true' contributions of individual variables to wage differentials.''Yun (005a, 766)

But Yun (005b) shows that one does not need to proceed in this cumbersome way. It is possible to implement the method by estimating only one equation. To illustrate the method, we follow Yun (005b) and suppose that we have a set of dummy variables (d's) and also L continuous variables (z's) in the model as before and, additionally, another set of dummy variables (q's). To simplify the presentation, we ignore the subscript i in this section.

This equation is called the ''usual regression'' and he proposes an alternative specification that does not omit the reference group:

Yun (005b) shows that if we estimate a model omitting, for example, the first category of variable d, we can obtain the estimates for γj that would prevail if the group r is omitted, simply doing γj - γr, and the intercept changes from α + γ1 to α+ γr. But taking on this averaging approach implies imposing that = 0 and = 0, as Suits (1984) states. Particularly, ''since these restrictions do not have unique solutions,'' he specifies the coefficients of the normalized regression as = γj + mγ and = λj + mλ , and refines the problem of deriving the normalized regressions as finding values of mγ and mλ. It turns out that their values are mγ = -γj /J and mλ = -λk /K, where γ1 = λ1 = 0 Yun (005b, 3). Considering this, Yun (005b) proposes the ''normalized equation'':

where = γj /J, = λk /K and γ1 = λ1 = 0.

If we estimate equation (4) for men and women separately, we can implement the Oaxaca decomposition of the wage equation that is invariant to the choice of the omitted category in the dummy variables.



In this section we apply the solution provided by Yun (005b) to solve the identification problem. Initially, we obtain the normalized regressions and after that we apply the Oaxaca decomposition. We use data from the Brazilian National Household Survey (Pesquisa Nacional de Amostra por Domicilios - PNAD), conducted annually by the Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística - IBGE, the Brazilian census bureau), for three years, 1988, 1996 and 2004.

The sample is composed of individuals between the ages of 25 and 54 living in urban areas who have positive job earnings. The choice of this age range is justified because it is the most important in terms of participation in the labor market. The exclusion of residents of rural areas is due to the change in the PNAD sample in 2004, which began to incorporate the rural area of the North region. Therefore, if we included these individuals, the comparison of the results among the three years would be impaired.

Finally, since we work only with individuals with positive job earnings (i.e., we exclude labor that is not remunerated), we avoid the problem of harmonization of the PNADs starting in 1992 with previous surveys.2

Table 1 presents the description of the data. The data on labor income refer to the hourly wage, measured in Reais of January 2002, deflated according to the index proposed in Corseuil and Foguel (2002).



The first thing that is interesting to note is that the gender wage gap narrowed in Brazil, from 0.475 in 1988 to 0.216 in 2004. In other words, in 1988, men earned 47.5% higher wages than women, but in 2004 this difference had fallen to 21.6%. In terms of education, there was a substantial improvement in the Brazilian situation over that interval. For instance, the proportion of men with 11 years of schooling increased 11.4 percentage points between 1988 and 2004. Despite this fact, women continue to be more educated than men. In 2004, 21% of women had 12 or more years of schooling, while only 14% of men had attained this level. In turn, the age profile of men and women is quite similar. With regard to the region of residence, the data show a concentration of Brazilian workers in the Southeast, Northeast and South regions, and remarkably, a falling proportion of workers living in metropolitan areas. However, the most interesting fact is the increasing proportion of men working in part-time activities along with a decrease in this number among women. Despite this, 14.5% of the female workers were in part time activities, while there were only 3.4% of men in this situation in 2004. Concerning informality, the gap between men and women fell considerably in the period, from 12 percentage points to 2 points in favor of women. The main cause of this decrease was the huge informality growth among men between 1988 and 1996.

After this short discussion of the descriptive statistics, we are able to evaluate the results of the methodology adopted by the paper. First we analyze the regression estimate results (eq.2), which are in the appendix (Table 6). The dependent variable is the logarithm of hourly wage and the independent variables are dummies for age (25-29, 30-34, 35-39, 40-44, 45-49, 50-54), schooling (0-3, 4, 5-7, 8, 9-10, 11, 12 years or more), race (white, non-white), region (North, Northeast, Southeast, South, Midwest), part-time (working less than 20 hours per week), informal (do not contribute to social security system3) and metropolitan area.

In all years, the coefficients' signs are aligned as expected. In terms of age, the coefficients show a positive relationship between wage and age for ages between 25 and 39. For other ages, there is no regular relation. The results also indicate that the higher the educational level, the greater the wage. It is interesting to observe the sheepskin effect in the educational estimates, although the returns of education show a decrease between 1988 and 2004. The white coefficient has the sign normally obtained, independent of gender, that is, white workers earn more than comparable non-white workers. However, this difference is smaller in 2004 than in previous years. The regional dummies are different for the periods. In the case of the Midwest, the wages were smaller than in the North region (reference group) in 1988 and became greater in 2004. Those in the Northeast are systematically lower than in the North for all years. In the other regions the wage differential, which favored the North in 1988, ceased existing (South) or reversed (Southeast) in the other periods. The part-time coefficient has a positive sign for both women and men, with a larger magnitude for men. There is a significant increase in the coefficient in 1996, doubling in the case of men. In 2004, the coefficient falls, but is still higher than in 1988. The metropolitan coefficient indicates that individuals living in metropolitan areas earn higher wages than others. However, the differential declines over the period studied. Finally, workers in the informal sector earn lower wages than other workers in all the years analyzed, occurring a sharp fall in 1996 relative to the other years.

The appendix also contains the estimation results of equation (4) for 1988 (Table 7), 1996 (Table 8) and 2004 (Table 9). We can see that the sum of the coefficients of each category is zero, as required to apply the Oaxaca decomposition without the indetermination problem.

The Oaxaca decomposition technique permits identifying the factors that explain the wage differential between men and women, dividing them into two types: a part attributed to the observable characteristics and another part attributed to the ''market return'' to these characteristics. The second part could be attributed to ''discrimination'', because men and women receive different prices for their characteristics. Table 2 shows the evolution of the gender wage gap and the decomposition analysis.



As argued above, the data show a significant decrease in the gender-wage gap between 1988 and 2004. In 1988, men's wages were 47.5% higher than women's, but in 2004 this advantage dropped to 21.6%. The characteristics contribute to reduce the gap and the coefficients to raise the gap. If we were sure that the model was including all characteristics that explain the gap, the evidence would indicate the existence of discrimination favoring men. However, it is important to say that the main source of the fall in the gap between the years was the decline of the 'discrimination' term. The next three tables show the gender wage gap decomposition using the traditional and the normalized equation, respectively, in 1988, 1996 and 2004.

Columns 1 and 2 show the variables' contribution to the wage gap (which is the same in the traditional and normalized regressions). Two of the most important variables are schooling and the part-time dummy. These variables contribute to diminish the wage differential. The estimates (in the appendix) show a positive relation between these variables and wages. So, since women are on average more educated than men and they are the majority in part-time occupations, these variables contribute to reduce the gender wage gap. On the other hand, the higher informality among women contributes to explain 11% of the gender wage gap in 1988. The contribution of the other variables is very small.

The coefficients' effects are very different when we use the traditional instead of the normalized equation.4 This highlights the importance of the methodology applied here. For instance, the results using the traditional equation indicate that while age contributes to diminish the gap, schooling contributes to raise it. On the other hand, when the normalized regression is used, the effects of the schooling and age coefficients turn out to act in the same direction (to reduce the gap) and also lose significance. However, the main change is that the part-time dummy gains relevance in this decomposition. In 1988, the return of this characteristic contributes to reduce the total differential by 11.7% and in 2004 by 46.6%, that is, women's comparative advantage in these occupations could possibly explain the falling wage gap between 1988 and 2004.








There are hundreds of works all over the world that implement the Oaxaca-Blinder decomposition. However, most of these works are plagued by the identification problem when a set of dummy variables is used, as Oaxaca and Ramson (1999) show.

In this paper, we applied the solution proposed by Yun (005a,b) to the Brazilian gender-wage gap estimation. Our first finding is that the gender gap has been narrowing in Brazil since 1988. The results also show that as women are more educated and more engaged in part time activities than men, these factors contribute to reduce the gender gap. On the other hand, the difference in the constant term between men and women explains the entire wage differential. However, the increasing difference in the part-time coefficients between men and women is contributing to alleviate this situation and can also be indicated as responsible for the narrowing gender wage gap in Brazil since 1988.

Giuberti and Menezes-Filho (2005), who do not use any correction to the identification problem, conclude that different returns related to age are important to explain the wage gap and the part-time dummy is not important. However, as demonstrated in this work, these results arise basically from the choice of the omitted categories of the qualitative variables incorporated in the regression model. The application of the methodology of Yun (005a), which resolves the indetermination problem, as said before, highlights the importance of the part-time variable in the decline in the gender wage gap in Brazil over the past 20 years.



Blinder, A. S. (1973). Wage discrimination: Reduced form and structural estimates. The Journal of Human Resources, 8(7):436–455.         [ Links ]

Corseuil, C. & Foguel, M. (2002). Uma sugestão de deflatores para rendas obtidas a partir de algumas pesquisas domiciliares do ibge. Texto para Discussão IPEA, 9(897).         [ Links ]

Giuberti, A. & Menezes-Filho, N. (2005). Discriminação dos rendimentos por gênero: uma comparação entre o Brasil e os Estados Unidos. Economia Aplicada, 9(3):369–383.         [ Links ]

Kassouf, A. L. (1998). Wage gender discrimination and segmentation in the Brazilian labor market. Brazilian Journal of Applied Economics, 2(2):243–269.         [ Links ]

Lovell, P. A. & Wood, C. H. (1998). Skin color, racial identity and life chances in Brazil. Latin American Perspectives, 25(3):90–109.         [ Links ]

Oaxaca, R. (1973). Male-female wage differentials in urban labor markets. International Economic Review, 14:693–709.         [ Links ]

Oaxaca, R. & Ramson, M. (1999). Identification in detailed wage decompositions. Review of Economics and Statistics, 81(1):154–157.         [ Links ]

Ometto, Hoffman and Alves (1999). Participação da mulher no mercado de trabalho: Discriminação em Pernambuco e São Paulo. Revista Brasileira de Economia, 53(3):287–322.         [ Links ]

Suits, D. (1984). Dummy variables: Mechanics v. interpretation. Review of Economics and Statistics, 66(1):177–180.         [ Links ]

Yun, M. (2005a). A simple solution to the identification problem in detailed wage decompositions. Economic Inquiry, 43(4):766–772.         [ Links ]

Yun, M. (2005b). Normalized equation and decomposition analysis: computation and inference. IZA Discussion Paper, Institute for the Study of Labor, 9(1822).         [ Links ]



1 For Brazil, see for instance Lovell and Wood (1998), Kassouf (1998), Ometto, Hoffman and Alves (1999).
2 In 1992, the PNAD began to classify workers in production for their own consumption, construction for their own use and those working in other non-remunerated labor as occupied. In previous PNADs only a part of these people were considered occupied. But since we are not using the activity/occupation criterion for the cross-section of the sample, we avoid the incompatibility between the years of the survey.
3 A word of caution is necessary here, because in 1988, by our definition of informality, some public sector workers have been assigned to informal sector because before 1992 PNAD methodology does not permit to separate perfectly public and private sector workers.
4 Table 10 of the appendix presents the results of the decomposition altering the omitted group of the schooling, age and region variables. In this case, the contribution of the (traditional) coefficients is completely different than that presented in Tables 3, 4 and 5 which demonstrates the inefficacy of this approach.













Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License