Harmonizing income classes from 2000 and 2010 Brazilian censuses

Abstract: Income variables from the Brazilian population census (IBGE) are often used as proxies for the population’s socioeconomic level in spatial analyses of urban segregation, inequality and social exclusion. However, income variables are dependent on reference values (minimum wage) that change over time, which can be challenging for multitemporal analysis. This paper discusses this issue and proposes a methodology to adjust income data that allows a meaningful comparison between the datasets of two Census periods. The methodology was applied to five medium-sized cities of the state of São Paulo by adjusting income data from Census 2000 and 2010 according to the period’s inflation rates. The analysis shows that the methodology mitigates the comparability issues. Results better reflect the changes in population composition and in residential patterns of different income groups that took place over the 2000s in Brazil in medium-sized cities.


Introduction
Brazil underwent a period of rapid economic growth between 2000 and 2010, which was fueled by the increased demand for commodities exported by the country, including agricultural produce, oil and ore, known as the "commodity boom" (Petras 2014).During this decade, unemployment rates were greatly reduced, and the number of formal jobs, the salaries of low-skilled workers, and the average schooling of people from all population groups were increased (Lusting, Lopez-Calva and Ortiz-Juarez, 2012;Medeiros 2015;Marques 2014b;Saboia and Hallak Neto 2018).
Simultaneously to the economic growth, income transfer policies initiatives, implemented between 2000 and 2010, contributed to the reduction of social inequalities.One of such policies is the "Bolsa Família" welfare programme, which directly provided income for families in poverty and extreme poverty.The period was also marked by a significant increase in the value of the minimum wage above inflation rates, which augmented the purchasing power of low-income workers.Increasing minimum wages is an internationally recognized strategy due to its effects on reducing poverty levels and income distribution (Ilo 2012; Lee and Sobeck 2012).
Such rapid changes in the country's socio-economic structure present challenges for multitemporal quantitative analysis of inequalities, as it makes it more difficult to compare different time periods depending on the variable used for the analysis.This is a known issue concerning income data, which often prevents its use for comparative analyses since monetary values change over time and between study areas (Reardon et al. 2006).Marques and Requena (2013) highlight the exogenous factors that affect income variables in Brazil, which include inflation, income transfer policies, and changes in the data collection methods.
To minimize this problem, alternative categorical variables, such as occupational classes, are often adopted instead of income (Ganzeboom, De Graff and Treiman 1992;Ilo 2012).Indeed, the classification of the population into occupational groups is considered a more realistic representation of social structure (Marques 2014a;Préteceille 1995), which also favors international standardization (Ganzeboom, De Graff and Treiman, 1992;Ilo 2012).However, data on occupational classes as currently provided in the Brazilian Census is not readily suitable for most analyses due to three main reasons.First, it is only available as a sample in the census microdata; second, its categories are too fine-grained, thus requiring reclassification into larger and more comparable categories for easier analysis, by using classification systems such as the Erikson-Goldthorpe-Portocarero -EGP (see Barbosa and Marschner 2013;Marques 2014b); and, third, it is spatially aggregated in weighting areas, which is a very coarse geographic scale for intra-urban analysis, in particular for small and medium-sized cities.
Income data, conversely, is readily available in fine geographic scale (census tracts), and for multiple census periods, making it one of the most used variables to portray socioeconomic levels in Brazil (for examples see : Feitosa et al. 2007;França 2016;Moura and Feitosa 2019;Oliveira and Silveira Neto 2015;Prado 2012;Rocha 2011;Telles 1995;Ywata et al. 2011) as well as in other countries (see Dawkins 2007;Erbe 1975;Musterd et al. 2017;Reardon and Bischoff 2011).However, as income variables present issues concerning comparability over time, there is a need for methodologies that mitigate these problems and allow its use in comparative multitemporal analyzes.
In this paper, a methodology for harmonizing income classes from 2000 and 2010 Brazilian censuses is proposed.The next section discusses the nature of the problem and reviews the existing approaches to harmonizing income data across census periods.In the following section, the proposed methodology is presented, subsequently the method was applied to five medium-sized cities in the state of São Paulo.The results of the analyses are presented, and the advantages and limitations of the method proposed are discussed.The paper concludes with a discussion on the results in the context of interpreting income changes across time periods.

Comparability issues between income datasets across Census periods
The purpose of this section is to detail the issue of the comparability between income datasets across different census periods in Brazil and also discuss the approaches commonly adopted to address it in empirical studies.
Income data aggregated by census tracts is available from the Brazilian census in three different types of variables, all based on monthly income.These are detailed below, including the reference to the IBGE's tables using the same nomenclature adopted in IBGE's official data structure documents (IBGE 2000(IBGE , 2010)).
A. total monetary value of income by census tract (sum of the income of all residents from each census tract) -variable V003 from table DomicílioRenda_UF; B. total monetary value of income by census tract (same as a), but disaggregated by minimum wage (MW) category (such as "between 1 and 2 MW" or "between 3 and 5 MW") -variables V011 to V020 from tables PessoaRenda_UF and ResponsavelRenda_UF; C. population or household count by MW category, per census tract -variables V001 to V010 from tables PessoaRenda_UF and ResponsavelRenda_UF.
The total monetary value of income by census tract (variable A) allows the calculation of averages of income per person (or per household) by census tract, and variable B allows the average income disaggregated by MW classes to be computed.These variables are useful to compare average income across different study areas, but the fact they do not provide insight into the heterogeneity within census tracts limits their use.Conversely, variable C, which consists of the population or household counts by MW classes, provides the breakdown of the population distribution across categories within each census tract.As such, it shows heterogeneity within areas, revealing differences that are masked by the total or average incomes.
The fact that variable C is categorical means that it is potentially useful to study changes across two census periods, as changes in the monetary value of wages do not affect the categories.As long as the national MW is adjusted in line with economic and societal changes, MW categories portray economic groups consistently throughout time and are suitable to be used in comparison studies, albeit with the caveats of using income as a descriptor of social structure.However, variations in the MW values in Brazil have not always been in line with inflation or devaluation of currency in the period.Until 2011, there was no official law to regulate the increase of MW in Brazil (Saboia and Hallak Neto 2018) and, although such increases followed general guidelines, decisions were ultimately of political nature (Medeiros 2015).
The increase of MW values above inflation has particularly affected comparisons using income data between 2000 and 2010, when the MW value rose from R$ 151.00 in 2000 to R$ 510.00 in 2010.This is a nominal increase of 337%, and an actual increase of 74.9% when inflation is discounted.The increase above inflation was intended to guarantee real economic gains and help improve the quality of life for the low-income population.Ironically, the very fact that this attempt was successful is at the root of the comparison issues between these datasets.The problem, however, is not the increase in actual economic power of those groups which have their salaries fixed by MW, but the fact that these changes do not trickle up the socio-economic scale.Low-income groups, which have salaries fixed by MW, had an actual gain in economic power while remaining within the same MW categories.Middle and upper income groups, whose salaries are not indexed by the MW, had a tendency to move into a lower MW category due to the faster increase of the MW relative to their wages.This resulted in an increase in the percentage of the population in lower categories, which can give the illusion of an impoverishment of the middle-class.This problem can be exemplified by the study of Machado, Zaloty and Nascimento (2019).The authors investigated the evolution of the income of heads of household between 2000 and 2010, in Santo Amaro -state of Bahia, comparing different income groups disaggregated by sex.Their results point to a decrease in the number of heads of households with income above two MW and to an increase in the number of heads of families with an income up to 1 MW, suggesting the population became impoverished in the period (Machado, Zaloty and Nascimento 2019).However, since it is known that between 2000 and 2010 there was an increase in the MW and that it has reflected in the reduction of social inequality, unemployment and poverty (see Lusting, Lopez-Calva and Ortiz-Juarez 2012; Medeiros 2015; Marques2014b; Saboia and Hallak Neto 2018), it is likely that their results were affected by aggregation income data in MW bands.
The recognition of these data problems has motivated researchers to adopt strategies to mitigate the effects of variations in the MW.Marques and Requena (2013) use a relative average income approach to harmonize income data between censuses.The relative average income of each census tract was computed using variable A (total monetary value of income by census tract) divided by the average income of the entire study area in each period, thus normalizing income values between censuses.Using this strategy, Marques and Requena (2013) are able to compare data from 1991, 2000 and 2010 censuses in order to analyze territorial transformations in the metropolitan area of São Paulo.França (2016) applies the same strategy to compare income variations associated to racial composition between 2000 and 2010.
Another strategy commonly used is adjusting income monetary values for inflation.Gama and Machado (2014), for example, studied factors that influenced regional migration between 2000 and 2010, such as income.The authors applied the INPC (National Consumer Price Index, translated from Indice Nacional de Preços ao Consumidor) inflation index to individual income available in the microdata of each census, making those values comparable.A similar approach is documented by Berquó and Cavenaghi (2014).The authors applied the INPC to adjust income from the year 2000, which were reclassified into MW bands according to 2010's MW value.Ywata et al. ( 2011) also used the INPC to compare income datasets from 1994 to 2009 from the RAIS (Annual Report of Social Information, from the original Relação Anual de Informações Sociais) from the Brazilian Ministry of Labor and Employment.An important limitation of directly adjusting income values for inflation is that this strategy requires income information provided as a continuous variable.This is the case of the previously discussed studies, which use census microdata and the RAIS dataset.The latter includes formally employed individuals only and does not cover the entire population and the census microdata is only available at large spatial aggregation units (weighting areas) rather than census tracts.
The strategy of adjusting income values for inflation can be easily adopted to variable A (total income) at census tracts level.For the categorical variable C, however, the challenge lies in assuring the correct estimate of households/people are reallocated to the appropriate categories once their boundaries are updated.It is important to remember the classification of people in categories is based on individual data that is not available in the aggregate form.The recalculation of the number of people in each category requires, therefore, to be estimated using specific methodology.A similar problem affects variable B, which is the sum of income of all people/households initially allocated to that category.
It is also valid to mention the international strategies to circumvent the problem of population distribution in categories represented by the proposal of Blanchet, Piketty and Fournier (2021).The authors developed a method to transform categorical data into a continuous distribution based on generalized Pareto curves that later was categorized again.Another similar example refers to the Pareto-lognormal distribution proposed by Hajargasht and Griffiths (2013).In both cases, however, the population distribution as a whole is used, which again puts us in front of the problem of loss of spatial resolution of the data.
As such, there is a clear need for methods to harmonize income data available at the census tract level, which covers the entire population at a fine spatial scale.The methodology proposed in this article, detailed in the next section, aims to bridge this gap.

Methodology
Census income data in variables B and C can be obtained from tables PessoaRenda_UF and ResponsavelRenda_ UF, that contain information per working individual and per head of household, respectively.Those data are categorized in 10 classes, which boundaries can be seen in Table 1.Those classes were named A, B, C, D, E, F, G, H, and I in this study, for simplicity.The "no income" category (variable V020 from tables PessoaRenda_UF and ResponsavelRenda_UF) is problematic because it includes, simultaneously, people with no actual income as well as people who did not disclose their income to the interviewer (IBGE 2012; Hoffman and Ney2008; Osório et al. 2011).Thus, this category was removed from the analysis.Assuming the 2000 census data as the basis, 2010 income variables for heads of households were reclassified.The process was carried out in two stages, combining information from variables B and C: 1st) the average income of the heads of households by census tract and income range was computed and, 2nd) households were then reclassified according to the MW values of the year 2000, adjusted for the INPC inflation index (Figure 1).
The average household income of each income class by census tract was computed by simply dividing the aforementioned values of income in variable B (total income of the household with income between one and two MW, for example) by the total number of households in variable C (with income between one and two MW).The average income of each class in each census tract was used to reclassify households from those groups into income classes (A, B, C, D, E, F, G, H and I).For 2000, the classes were defined by 2000's MW (R$ 151.00).For 2010, those classes were determined by the value of the 2000's MW plus inflation (R$ 291.62).

empirical application: study areas
To demonstrate the potential of the methodology, five medium-sized cities in the State of São Paulo were selected: Araçatuba, Bauru, Marília, Presidente Prudente and São José do Rio Preto.These five cities are located in the northwest and midwest region of the State of São Paulo (Figure 2).In this region of the State, the occupation was marked by temporal and spatial differences that impacted the emergence and development of urban centers and took place according to the interests of the pioneers, as well as the conditions of the land and allocation of transport infrastructure, with emphasis on the installation of the railway lines (Mombeig 1984).São José do Rio Preto was the first of those cities to be settled (1852) and Marília the last one (1929).
Three of those cities (Araçatuba, Bauru and São José do Rio Preto) are classified as urban agglomerations, while the other two (Marília and Presidente Prudente) as urban centers (IPEA, IBGE and UNICAMP 2001).The first group tends to be classified as such by the spatial continuity between these cities and the smaller ones that surround them, while the second group is formed by relatively isolated cities (IPEA, IBGE and UNICAMP 2001).They have, as a common characteristic, the role of polarizing functions in relation to their surroundings (Melazzo 2006), which means that these cities have a greater amount and diversity of services and commercial activities that attract people from the surrounding smaller cities.These cities are classified as medium-sized by different authors, due to the identification of relevant characteristics such as their restructuring processes of urban centralities (Sposito and Góes 2015), presence of multiple gated communities (Rossi 2016;Zandonadi 2008), their conformation of spaces of exclusion (Nunes 2007), social inequality (Melazzo 2006) and segregation (Araujo, Barros and Queiroz 2018;Silva 2020).These cities have been analyzed by Melazzo (2006), who compared their intra-urban spatial patterns of social inclusion/exclusion, and Gomes (2007), who studied four of these cities analyzing changes in the industrial production process and the implications for the social structure that took place in the 1990's.Both pointed out similarities between them.
The study of medium-sized cities also imposes the need to work with fine scale data.Unlike metropolitan regions, it is necessary to use data aggregated by census tracts to analyze their intra-urban heterogeneities, as the use of weighting areas is not suitable due to their size, as illustrated in Figure 3

Results and Discussion
As noted earlier, families that earned multiple MW tended to be classified, in 2010, into categories below their original category from 2000 (Table 3), even though their actual income increased in the period (but not as fast as the MW itself).This trend is shown in Figure 4, where the percentage of families in the classes H and G (0.5 to 1 and 1 to 2 MW) increases significantly in the original classification for 2010 data (blue bar) in all of the study areas.
A quick look at the graphs may lead to the interpretation that incomes dropped sharply in the period, which is not in line with reality.
The resulting population composition according to the adjusted classification proposed by the methodology can also be seen in Figure 4 (green bar).The plot shows a significant increment of the proportion in the class F (2 to 3 MW) in all study areas.The plot also shows there is a sharp decrease in the proportions of the lower income classes I and H (0 to 0.5 and 0.5 to 1 .m.w.), reflecting the decrease in poverty that happened during that decade.Regarding the upper classes, a small increase can be seen in the proportion of the C (10 to 15 m.w.) class, accompanied by a decrease in the richer economic classes A and B (who earn 15 or more m.w.).All graphs present an overall trend of increase in the proportion of the middle classes and decrease in the proportions of the lower and upper classes, thus indicating economic gains of the lower classes and a decrease in inequality during that decade.A similar misinterpretation that poverty increased in the period, rather than having decreased, can be caused by maps showing the spatial distribution of income classes.Figure 5 shows the predominant income group in each census tract in the years of 2000 and 2010.The map was prepared by classifying the census tract with the income group with the highest number of heads of household.Again, a quick look at the maps using the original data from 2010 may indicate the city is overall poorer in 2010 when compared to 2000, which is the opposite of the observed reality.
The predominant socioeconomic group map of 2010 adjusted for inflation, also in Figure 5, presents a pattern more in line with the decreased income inequality of the period.In those maps, areas predominantly occupied by lower income households (class I or income between 0 to 0.5 m.w.) are almost completely absent.Areas predominantly occupied by the higher income households (classes A and B) also have decreased.Conversely, there is a greater predominance of middle-income classes (classes G and F) in a greater number of census tracts in all cities of the study area.To provide a further detailed view of the effects of the proposed new classification, the ratio of income groups by census tract map of São José do Rio Preto (Figure 6) is presented here as an example.Although we selected the city of São José do Rio Preto, these effects are notable in the other four cities as well.The map was prepared by calculating the proportion of each income group by census tract.The maps on the 2010 column (original classification) show larger rates of heads of the household in the lower income classes in comparison to the 2000's maps.The 2010's adjusted column, however, indicate that most of the population is classified in the middle-income ranges (classes G to C).The 2010 adjusted maps also show an almost absence of people in the lowest income classes (I and H).Income class B is also almost absent in the new classification, although there are still a significant number of families classified in the highest income class (A).
Overall, the new classification reflects the better living conditions of the lower income households, decrease in income inequalities and greater income distribution that took place during the 2000s decade in Brazil, as the literature claims (Lusting, Lopez-Calva and Ortiz-Juarez 2012; Marques 2014b; Wesisbrot, Johnston and Lefebvre 2014).And it also illustrates that the use of data without prior harmonization for comparative multitemporal analyzes can result in biased conclusions (Ganzeboom, De Graff and Treiman 1992;Ilo 2012;Reardon et al. 2006).

Final Remarks
In this paper a methodology for harmonizing income MW categories from 2000 and 2010 Brazilian censuses was proposed.MW categories from 2010 are adjusted using the 2000 MW as a reference, adjusted for inflation.The adjusted classification provides a more accurate view on income distribution in 2010.It also allows more intuitive comparison between 2000 and 2010 IBGE's census income data.The methodology was developed to be applied to aggregated data in the smallest available aggregation unit (census tract) allowing intra-urban comparative analyzes without loss of spatial resolution of the data, which commonly happens with methodologies that use the total population composition of the study area.
A limitation, which requires further improvements and studies, is the use of the arithmetic mean as a parameter to estimate the categories.It is known that for skewed distributions, the mean is not a good representation of the center of the sample.Regarding income, the curve is usually skewed, when considering the total population of the study areas.However, on a smaller scale (census tract) the self-similarity between neighbors tends to be higher, for which case the average is a suitable option.Other parameters, nonetheless, might be able to provide a better fit.
The advantage of the methodology is that it is easily adaptable and applicable to other study areas, including metropolitan areas.It can easily be extended to other continuous variables that are presented into grouped data and whose reference values change over time, such as educational achievement scores and housing values.And, unlike occupational information, it does not require the use of programming tools like R or software to handle extensive databases, it can be easily implemented using spreadsheets.The methodology's greater contribution is a more realistic view of empirical reality, which opens avenues for studies that aim at analyzing changes in social exclusion and mobility, spatial patterns of segregation, inequality and poverty over time.

Figure 2 :
Figure 2: Location map of the study areas . Marília, for example, has 269 urban census tracts and only 11 weighting areas that encompass a much more comprehensive part of the rural area as well.Source: IBGE (2010), prepared by the authors (2022).

Figure 3 :
Figure 3: Comparing census areal units of data aggregation (census tracts and weighting areas).

Figure 4 :
Figure 4: Population composition by income class in the years 2000 and 2010.

Figure 6 :
Figure 6: Population of each socioeconomic group by census tracts (2000 and 2010) in the city of São José do Rio Preto.

Table 1 :
Census income classes

Table 2 :
Evolution of the Brazilian national MW and classification effects