Using Normalized Equations to Solve the Indetermination Problem in the Oaxaca-Blinder Decomposition : An Application to the Gender Wage Gap in Brazil

There are hundreds of works that implement the Oaxaca-Blinder decomposition. However, this decomposition is not invariant to the choice of reference group when dummy variables are used. This paper applies the solution proposed by Yun (005a,b) for this identification problem to Brazilian gender wage gap estimation. Our principal finding is the increasing difference in part-time work coefficients between men and women, which contributes to narrow the gender wage gap. Other studies in Brazil not using any correction of the identification problem have found different results.


Introduction
The decomposition proposed by Oaxaca (1973) and Blinder (1973) has been applied in hundreds of studies around the world, including Brazil1 .In general, the methodology is applied to separate the wage differentials of distinct groups (men/women, white/non-white) in two components, on of them related to differences in observable characteristics of the two groups.For example, men could earn more because they have more experience or are more educated than women.
Nevertheless this part, called the "explained differential", is responsible for only a fraction of the wage gap between the groups.The remaining gap, called the "unexplained differential", is attributable to different returns on the characteristics between the two groups.For instance, men and women with the same level of education could receive a different reward.Some authors attribute this unexplained gap to discrimination, but there is a controversy in the literature over whether this conclusion is valid.The argument against this idea is that one can only say that a differential is attributable to discrimination if the estimation has considered all variables that affect wages and are different between groups.Obviously, it is hard to believe that any regression specification can assure this.
Apart from this debate, the Oaxaca-Blinder decomposition can be used to assess the contribution of each variable to the explained and unexplained gap.
However, Oaxaca and Ramson (1999) show that the differential share attributable to the model's dummy variables depends on the choice of the reference group.On the other hand, the overall gap fraction due to "explained" and "unexplained" is not beset by this problem.
Additionally, Yun (2005aYun ( , 2005b) ) proposes a method to implement the Oaxaca-Blinder decomposition that solves this indetermination problem.He estimates "normalized" equations, imposing the restriction that the sum of the dummies' coefficients is zero.
In the Brazilian case, there are no papers that consider this indetermination related to dummy variables.So, the aim of this paper is to present the indetermination problem and the solution proposed by Yun (2005b) and apply it to the Brazilian case.To do this, in the next section we show the indetermination problem and the solution proposed by Yun (2005b).Next, we apply this solution to three years in Brazil: 1988Brazil: , 1996Brazil: and 2004.The choice of these years particularly allows a comparison with the results of Giuberti and Menezes-Filho (2005), who used 1988 and 1996 in their analysis.

The problem
We present the identification problem in the Oaxaca-Blinder decomposition based on Oaxaca and Ransom (1999).Suppose that one estimates separate wage regressions for males and females.Suppose that we have a set of dummy variables (d's) and also L continuous variables (z's) in the model, where ∑ = where i y is the mean log wage, 0 ˆi α is the estimated intercept, ij γˆ is the estimated coefficient for the dummy variable ij d ; il δ ˆ is a vector of estimated slope coefficients for the set of regressors comprising the lth variable, il z is a vector of regressor means for the set of regressors comprising the lth variable, and, ij γˆ= Still following Oaxaca and Ramson (1999), we can decompose the mean wage differences as follows: where the last two terms in each equality measure the "endowment" effect, while the others capture the "discrimination" effect.
The first thing that Oaxaca and Ramson (1999, p. 156) indicate about this decomposition is that "the estimated overall discrimination and the estimated overall endowment effect are invariant to the choice of left-out reference group and the suppression of the constant term in the absence of a left-out reference group."The most important issue, however, is that the contribution of variable d to the discrimination effect is sensitive to the left-out reference group, because the intercept varies with changes in that reference group.
To see this, suppose that the last dummy variable has been chosen for the left-out reference group.In this case, the "discrimination" effect would be ) ( Oaxaca and Ramson (1999) show that if there is only one set of dummy variables in the regression, this problem can be solved by incorporating to the contribution of variable d to the "discrimination" effect.However, this solution is not valid if there is more than one set of dummy variables in the estimated equation, a very common situation in the context of wage regressions.Yun (2005a) proposed a methodology to disentangle the identification problem, based on normalized regressions.The idea concerning normalized regressions is that if "alternative reference groups yield different estimates of the (…) coefficients effect for each individual variable, then it is natural to obtain estimates of the (…) effect for every possible specification of the reference groups and take the average of the estimates of the (…) effect with various reference groups as the 'true' contributions of individual variables to wage differentials."(Yun, 2005a, p. 766) But Yun (2005b) shows that we do not need to proceed in this cumbersome way.It is possible to implement the method by estimating only one equation.To illustrate the method, we follow Yun (2005b) and suppose that we have a set of dummy variables (d's) and also L continuous variables (z's) in the model as before and, additionally, another set of dummy variables (q's).To simplify the presentation, we ignore the subscript i in this section.

The solution
( ) This equation is called "usual regression" and he proposes an alternative specification that does not omit the reference group: ( ) (3) Yun (2005b) shows that if we estimate a model omitting, for example, the first category of variable d, we can obtain the estimates for j γ that would prevail if the group r is omitted, simply doing " (Yun, 2005b, p. 3).Considering this, Yun (2005b) proposes the "normalized equation": ( ) where If we estimate equation ( 4) for men and women separately, we can implement the Oaxaca decomposition of the wage equation that is invariant to the choice of the omitted category in the dummy variables.

An application to the Brazilian case
In this section we apply the solution provided by Yun (2005b) to solve the identification problem.Initially, we obtain the normalized regressions and after that we apply the Oaxaca decomposition.We use data from the Brazilian National Household Survey (Pesquisa Nacional de Amostra por Domicilios -PNAD), conducted annually by the Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística -IBGE, the Brazilian census bureau), for three years, 1988, 1996 and 2004.Table 1 presents the description of the data.

< Table 1 here >
The first thing that is interesting to note is that the gender-wage gap narrowed in Brazil, from 0.487 in 1988 to 0.216 in 2004.In terms of education, there was a substantial improvement in the Brazilian situation over that interval.For instance, the proportion of men with 11 years of schooling increased 11.3 percentage points between 1988 and 2004.Despite this fact, women continue to be more educated than men.In turn, there is a larger fraction of women working among 30 and 44 year-olds, while men are the majority among older and younger workers.With regard to the region of residence, the data show a concentration of Brazilian workers in the Southeast, Northeast and South regions, and remarkably, a falling proportion of workers living in metropolitan areas.However, the most interesting fact is the increasing proportion of men working in part-time activities along with a decrease in this number among women.Despite this, 14.5% of the female workers were in part time activities, while there were only 3.4% of men in this situation in 2004.
The coefficients signs are aligned as expected.In terms of age, the coefficients show a positive relationship between wage and age.The results also indicate that the higher the educational level, the greater the wage.It is interesting to observe the sheepskin effect in the educational estimates.The white coefficient has the sign normally obtained, independent of gender, that is, white workers earn more than comparable non-white workers.The regional dummies are different for women and men.For men, the results show that the wage is higher in the Midwest and Southeast compared to the North and smaller in the Northeast compared to the North, while there is no difference between the South and North.For women, the wages are larger in the Midwest, Southeast and South compared to the North, and smaller in the Northeast compared to the North.The part-time coefficient has a positive sign for both women and men.Finally, the metropolitan coefficient indicates that individuals living in metropolitan areas earn higher wages than others.
The appendix also contains the estimation results of equation ( 4) for 1988, 1996 and 2004.We can see that the sum of the coefficients of each category is zero, as required to apply the Oaxaca decomposition without the indetermination problem.
The Oaxaca decomposition technique permits identifying the factors that explain the wage differential between men and women, dividing them into two types: a part attributed to the observable characteristics and another part attributed to the "market return" to these characteristics.The second part could be attributed to "discrimination", because men and women receive different prices for their characteristics.Table 2 shows the evolution of the gender-wage gap and the decomposition analysis.

<Table 2 here>
As argued above, the data show a significant decrease in the gender-wage gap between 1988 and 2004.In 1988, men's wage was 63% higher than women 's, but in 2004 this advantage dropped to 24%.The characteristics contribute to reduce the gap and the coefficients to raise the gap.If we were sure that the model was including all characteristics that explain the gap, the evidence would be indicating the existence of discrimination favoring men.However, it is important to say that the main source of the fall in the gap between the years was the decline of the 'discrimination' term.The next three tables show the gender-wage gap decomposition using the traditional and the normalized equation, respectively, in 1988, 1996 and 2004.

< Table 3 here > < Table 4 here > < Table 5 here >
Columns 1 and 2 show the variables' contribution to the wage gap (which is the same in the traditional and normalized regressions).The most important variables are schooling and the part-time dummy.These variables contribute to diminish the wage differential.The estimates (in the appendix) show a positive relation between these variables and wages.So, as women are in average more educated than men and they are the majority in part-time occupations, these variables contribute to reduce the gender-wage gap.The contribution of the other variables is really very small.The coefficients effects are very different when we use the traditional instead of the normalized equation.This highlights the importance of the methodology applied here.For instance, the results using the traditional equation indicate that while age contributes to diminish the gap, schooling contributes to raise it.On the other hand, when the normalized regression is used, the coefficients effects of schooling and age turn out to act in the same direction and also lose significance.However, the main change is that the part-time dummy gains relevance in this decomposition.In 1988, the return of this characteristic contributes to reduce the total differential by 14.5% and in 2004 by 56.8%, that is, the women's comparative advantage in these occupations could possibly explain the falling wage-gap between 1988 and 2004.

Conclusion
There are hundreds of works all over the world that implement the Oaxaca-Blinder decomposition.However, most of these works are plagued by the identification problem when a set of dummy variables is used, as Oaxaca and Ramson (1999) show.
In this paper, we applied the solution proposed by Yun (2005aYun ( , 2005b) ) to the Brazilian gender-wage gap estimation.Our first finding is that the gender gap has been narrowing in Brazil since 1988.The results also show that as women are more educated and more engaged in part time activities than men, these factors contribute to reduce the gender gap.On the other hand, the difference in the constant term between men and women explains the entire wage differential.However, the increasing difference in the part-time coefficients between men and women is contributing to alleviate this situation and can also be indicated as responsible for the narrowing gender-wage gap in Brazil since 1988.Giuberti and Menezes-Filho (2005), who do not use any correction to the identification problem, conclude that different returns related to age are important to explain the wage gap and the part-time dummy is not important.But, as showed here, by solving the indetermination problem these results are inverted.
and i= m,f.For instance, d could be region of residence and, in Brazil, J=5 (South, Southeast, North, Northeast, Midwest).Without loss of generality, 1 i d will be the omitted category.The separately estimated wage equations for type i individuals at the sample means are given by: states.Particularly, "since these restrictions do not have unique solutions," he specifies the coefficients of the normalized regression as problem of deriving the normalized regressions as finding values of γ m and λ m .It turns out that their values are J