Composition of the Health Inequality Index analyzed from the inequalities in mortality and socioeconomic conditions in a Brazilian state capital

The issue of social inequalities is a subject of recurrent studies and remains relevant due to the growing trend of these inequalities over the years. This study proposes the creation of the Health Inequality Index (HII) composed of health indicators – Mean life span and Mean Potential Years of Life Lost (PYLL) – and socioeconomic indicators of income, schooling, and population living in poverty in the city of Natal – the State Capital of Rio Grande do Norte, Brazil. Therefore, a probabilistic linkage was made between mortality and socioeconomic databases in order to capture the census tracts of households with death records from 2007 to 2013. The authors used the Principal Component Factor Analysis to calculate the index. The Health Inequality Index showed areas with worse socioeconomic and health conditions located in the suburban areas of the city, with differences between and within the districts. The difference in the mean life span between the districts of Natal arrives at 25 years, and the worst district has mortality rates comparable to poor African countries. Public policymakers can use the index to prioritize actions aimed at reducing or eliminating health inequalities.


Introduction
Several academic studies and government reports reveal differences in health outcomes between different population groups. They are not new if we consider that, in England, the cradle of the Industrial Revolution, social differences have already been observed in the outcome of diseases and deaths 1 since the 19 th century. While extensively studied, what makes the theme of social inequalities still an object of recurrent study in different areas of knowledge? One of the reasons should be the expansion of these inequalities over the years, a phenomenon described by Thomas Piketty 2 in its economic aspects and by Diderichsen et al. 3 and Marmot 4 regarding the health of the population groups. If this social gap has deepened in recent decades, despite the volume of evidence already published on the subject, it can be suggested that there is a gap between the production of knowledge and attempts to resolve this problem. Thus, endeavors are reinforced to keep the subject of social inequalities current and relevant 5,6 .
Despite the growing scientific interest on the subject, especially since the 2000s, when the number of studies on the social determinants of health was expanded, as well as the evidence that related the contextual factors of life in society with the health outcomes of populations 7 , the theme is not included in the political agenda, or initiatives are still shy when a movement in this direction is initiated.
On this topic, Marmot and Bell 8 are clear in stating that health inequities are a topic of public health interest and, if they are, they become a concern of epidemiology. The epidemiological method has contributed to show that the socioeconomic position of individuals, measured mainly through income, schooling, and occupation, is associated with health outcomes 9 .
However, epidemiological interest in the health effect of the district has grown in recent years. An example is the search for associations between areas with worse socioeconomic conditions and health outcomes, such as mortality 10 . The epidemiological analysis of health inequities measured at the aggregate level contributes to sectoral actions, e.g., health surveillance, and allows directing intersectoral public policies aimed at reducing or eliminating these inequalities 11 .
A supported method that can summarize several aspects encompassing the socioeconomic characterization of the district -such as income, occupation, and schooling -is the composition of indices 12,13 . In Brazil, some index creation initiatives aim to analyze the territory to assist in government planning, inside and outside the health sector 14,15 .
Even with the use aimed at the health area, most of the proposed indices have not used epidemiological morbimortality variables in the construction of these composite indicators. It is, thus, understood that there is a gap in the construction of indicators that aggregate variables representing the socioeconomic status, which are social determinants of health, and variables of health outcomes. In Brazil, these indicators can be combined from population census data from the Brazilian Institute of Geography and Statistics (IBGE) and aggregation of mortality data, for example, provided by the Mortality Information System (SIM) of the Ministry of Health.
This study aims to contribute to the inclusion of social determinants in the political agenda of health management, as it sets out to develop an index that measures health inequality, based on socioeconomic indicators and mortality data in Potential Years of Life Lost (PYLL) in the city of Natal, Rio Grande do Norte (RN), Brazil, enabling the planning and monitoring of health actions in the territory.

Methods
The study's analysis units comprise the 36 official districts of Natal (RN) and its 895 census tracts, characterizing it as an ecological study.
The study employs quantitative variables from two different sources: the mean life span and PYLL variables were calculated from the death records of the SIM database. The income, schooling, and population in poverty variables were calculated from data of the IBGE 2010 Demographic Census. The independent variables that subsequently became part of the index were selected because they are the variables relevant for social stratification (income, schooling, and occupation) 5 and that were disaggregated at the census tract level in the 2010 census. Occupation was not included in the analysis as it did not have any information, as well as schooling in years of study, which could provide a better breakdown of population groups.
We decided to select all the deaths that occurred in residents of the city of Natal (RN) between 2007 and 2013 to compare the mortality data, specifically the PYLL and the mean life span, with the socioeconomic data of the 2010 demographic census, as we sought to avoid fluctuations in the results due to the low number of records, especially for small areas, such as the census tracts, by adding three years to the year of census collection.
The PYLL correspond to an epidemiological measure commonly used in public health and portrays premature mortality since it involves the mean estimated years a person could have lived had he not died prematurely. In this study, it was chosen as the cutoff age established initially, at 70 years of age 16 , and which has already been used in Brazil and internationally 17,18 . Additionally, PYLL means were calculated for three main groups of causes of death: infectious and parasitic diseases, chronic non-communicable diseases, and external causes, as per classification proposed by Nogueira 19 . The categorization of causes of death is in line with the three groups of causes proposed in the study of the Global Burden of Diseases 20 .
Data on mortality were provided by the State Health Secretariat of Rio Grande do Norte (SES-AP-RN) in electronic spreadsheets in October 2018, after an ethical appraisal and a favorable opinion, since it was necessary to include the addresses of the deceased. The socioeconomic data aggregated by census tracts and the addressing data used for the linkage of the databases were collected from the IBGE website (https://cen-so2010.ibge.gov.br/) accessed in October 2018.
Data analysis can be divided into three stages, with different procedures in each one. It was necessary to standardize the addresses in the SIM database before the first data analysis. Subsequently, in the first stage of the analysis of the study, we performed a descriptive analysis of the data on the deaths, calculating the mean life span and the mean PYLL by district and administrative area of the city of Natal (RN).
The overall mortality coefficients (OMC) were also calculated, considering that the values used as a numerator (total deaths per district) correspond to a mean of the seven years covered by the study. The indicators were standardized by age group (0-4 years; 5-19 years; 20-44 years; 45-64 years; 65 years and over) to accurately compare the coefficients by district by the direct method, using the 2010 population of Natal (RN) as standard. The district with the lowest standardized OMC was obtained from this calculation, and the excess deaths were computed for each district, compared to the standard district (lowest standardized OMC value).
Spearman's correlation of health variables, represented by the mean life span and mean PYLL, with socioeconomic indicators was evaluated. Thus, a deterministic linkage was carried out using the name of the district as a key field, since this field was present in the SIM and IBGE databases. At this stage, we decided to remove the Salinas district, since while being a large, sparsely populated mangrove area, it recorded few deaths for the period, generating inconsistency in the analyses.
In the second stage of the analysis, the RecLink III (version 3.1.6.3160) free access software was used to perform a probabilistic linkage between SIM databases and IBGE address database for statistical purposes, to gather the census sector field and the death database, thus allowing the analysis of a geographic aggregate smaller than the district. The standardization and homogenization of the death database was necessary for this stage, especially in the "district" and "residence type" fields, which were compatible with those used by the IBGE address database.
The parameters used for the linkage routine for the street name followed the one recommended by the technical manual of the software for linking people's names 21 . The Soundex of the first name of the record (PBLOCO), the Soundex of the last name of the record (UBLOCO), the initials of the mid-record name (FNOMEI) of the names of the public places were used as parameters of the blocking step. The "street type" and "district" fields were also used to compare the records.
As a cut-off point for the scores obtained in the database combination stage, those with values above 10.0 were considered pairs. The other records were considered non-pair, and there was no manual analysis of the records with scores lower than the previously mentioned value.
With the database resulting from the probabilistic linkage, a descriptive analysis of the sample was carried out. Again, the mean life span and PYLL means were calculated. A correlation analysis between death data and socioeconomic indicators was performed by statistical software IBM SPSS (version 23).
After performing the probabilistic linkage, a multivariate data analysis was conducted to create the Health Inequity Index (HII). We decided to calculate the index from the exploratory factor analysis by main components (PCA) between the Z-scores of the dependent (mean life span, mean of overall PYLL and by groups of causes) and independent variables (Mean Monthly Income Per Household Per Capita, Proportion of literate people and Proportion of low-income households).
A random sample of 30% of the data set on the census tracts was obtained to validate the results obtained with the extraction of the factor from the random selection available in the statistical software IBM SPSS. A new factor analysis (PCA) was performed with the same parameters as the complete analysis, and then, the validity of the proposed method was assessed.
Finally, HII thematic maps were prepared, using the map files in the Shapefile format made available by IBGE when the analysis was performed for the census tracts, and provided by the Municipal Urbanism Secretariat of Natal (RN) when the analysis was made for the city's districts. The QGIS software (version 2.18.18) was employed to create the maps. The project was approved by the Research Ethics Committee of the Federal University of Rio Grande do Norte on September 10, 2018.

results
The number of death records obtained after standardization, homogenization, and exclusions in the database obtained was 30,546 deaths, with at least the registered district of residence. This represents 93% of the records found in the database sent by SESAP-RN, and 97.3% of the total deaths officially announced by SIM through DATASUS (31,403 deaths). Table 1 shows the difference in the mean life span between the districts of Tirol (74.51), located in the east, and Guarapes (49.03), located in the west. This represents a difference of 25 years that are no longer lived by individuals living in the same city. The disparity in the years lived between Guarapes and Tirol represents, on the one hand, a result comparable to the life expectancy of the Democratic Republic of Congo (49 years old) or Somalia (50 years old) -a country that has been living in civil war for decades -and on the other hand, it has a life expectancy more compatible with the Brazilian mean for the same period (74 years) 22 . Table 1 also shows the general mortality rates by district and the OMC standardized by age groups, as per the direct standardization method. The Salinas district had the lowest standardized OMC, but due to its small population and low number of deaths, we decided to use the second-lowest value for the indicator, represented by the Capim Macio district, in the south zone (3.0). The OMC ranged from 7.7 in a district on the east region to 3.0 in the district with the lowest coefficient, corresponding to 2.5 times the mortality coefficient between the extremes.
When calculating the number of deaths expected for each district if they had the same OMC as the Capim Macio district, excess deaths were obtained by subtracting the number of deaths recorded between 2007 and 2013. Alecrim (east zone) was the district with the highest excess deaths, with 1,250 deaths.
The result of calculating the means of PYLL by large groups of causes is shown in Table 1. The group of external causes has the highest values since the group most affected by violence and accidents is in younger age groups. The locations with the worst socioeconomic indicators (Salinas, Felipe Camarão and Guarapes) are located in the west and north side of the city, and are also among those with the lowest mean life span, as well as the highest PYLL means in all groups of causes. On the other hand, the districts with the best socioeconomic results (Petrópolis, Tirol, and Capim Macio) are located in the eastern and southern areas and have the best mortality results, with a high mean life span and low mean PYLL. The Guarapes district has the second-lowest mean per capita household income (R$ 209.37 monthly), which represents a mean income 14 times lower than in the Tirol district (R$ 2,951.96) and almost 16 times lower than that observed in Petrópolis (R$ 3,315.12), located on the east side.
The pattern of inequality may seem less noticeable for the variable "proportion of literate", but even so reveals that one in four residents aged 15 years and over in the Guarapes district is illiterate. In contrast, the Capim Macio district has a proportion of literate of over 98%, a pattern similar to the districts of Petrópolis, Tirol, and Pitimbu, all located in the east and south regions of the city. Table 2 shows the values of Spearman's correlations for the study variables. It contains positive correlations, such as the mean life span and mean household income per capita, and the negative correlations, between the mean life span and the proportion of low-income households. Table 2 also reveals Spearman's correlation data between dependent and independent variables after performing the probabilistic linkage. Of the 757 census tracts resulting from the merging of databases, 754 tracts showed values for all variables. While all correlations are also statistically significant in this stage, we observed a reduced strength of this correlation against the model with all 895 tracts.
Given the representativeness of tracts obtained in the linked database (SIM-IBGE addresses), a factor analysis with the combined (death and socioeconomic) variables was carried out. The factor analysis by main components showed a good fit of the model, with the KMO test above 0.80, with a component variance of 60%. Despite the good fit, only 515 tracts were computed, and 242 were not calculated, as a result of the high number of tracts without information on death broken down by groups of causes.
When removing the mean PYLL by groups of causes, maintaining the overall mean PYLL, the model had a worse fit (KMO 0.68) but improved the proportion of variance of the component (67%). The result of the validation of the factor analysis from the sample of valid tracts (n = 230) showed proximity to the data obtained by the complete analysis, with KMO of 0.66 and component variance of 66%.
After categorizing the factor into three groups based on tertiles, a descriptive analysis of the groups was carried out to facilitate comparison and visualization between the different levels of the areas in the municipality of Natal (RN), the results of which can be seen in Table  3. Worth noting is that one gradient reveals the worst health and socioeconomic conditions in Category 1 to the best conditions in Category 3.
The values obtained with the factor analysis were then called the Health Inequity Index (HII). The results of the indexes by administrative areas show differences between the number of tracts with the worst HII. The west (51.9%) and north (44.9%) areas show higher proportions of tracts classified with the worst HII (Category 1), while the south (84.4%) and east (58.6%) areas have the highest proportions of tracts with the best HII (Category 3).
The Health Inequity Index shows a spatial pattern similar to that observed in the descriptive analysis, either at the level of districts ( Figure  1-A) or by census tracts (Figure 1-B), with the worst indexes among tracts and districts in the west and north areas. Figure 1 shows the distribution of tracts within the districts of Potengi (Figure 2-A) and Pitimbu (Figure 2-B), located in the north and south, respectively. Details evidence inequalities within the district, with worse rates in some tracts, revealing some heterogeneity in larger areas such as districts.

Discussion
Mortality data broken down by Natal districts pointed out relevant disparities between them, generating a 25-year variation in the mean life span. Similar differences were found in neighborhoods in the city of São Paulo, where the mean age at death varied 24 years among the districts with the worst and best indicators 23 . Inequality also affects developed countries, not as  The difference between standardized overall mortality coefficients (OMC) was another measure used to assess differences in the pattern of mortality between districts. This indicator was used by Groenewald et al. 25 to promote health surveillance based on mortality data, at the local level, in a city in South Africa. The authors found the same disparity in the OMC between the city's sub-districts.
The use of OMC to analyze the local health situation showed the importance of analyzing mortality data through different measures, since the Guarapes district (west zone), with OMC below 5.0 deaths per 1,000 inhabitants, had a mean life span below 50 years. This fact suggests that even with a low number of deaths, the deaths that occur in this location must be reaching a population in the younger age group.
Mortality represented in PYLL by groups of causes contributes to broadening the view on mortality, emphasizing deaths that occur prematurely. These data reveal that the group of external causes has a high proportion in the mean PYLL among the locations with shorter mean life span, and this is because deaths from external causes (group 3) occur in younger age groups 17,26 . However, when analyzing the means of PYLL by all groups of causes, one can understand the configuration of the triple burden of diseases in Brazil, where chronic diseases, infectious diseases, and external causes are concomitant 27 . This becomes a challenge for health managers and policymakers in general.
The analysis of the distribution of PYLL means by groups of causes allows assessing not only the magnitude of premature deaths between districts but also the profile of these causes, which is closely related to the living conditions of the populations. The less developed the nation, the higher the share of infectious diseases

High inequity
Average inequity

High inequity
Average inequity  and maternal, neonatal, and nutritional causes in premature mortality. The same pattern is not explained for external causes, whose repercussion on premature death is more pronounced in upper-middle-income countries 26 . In Brazil, this group of causes corresponds to the second cause of early mortality, and homicide is the leading cause among men 27 .
The worst living conditions, indicated by low per capita household income, persistent illiteracy in people over 15 years of age, and the high proportion of low-income households, are among the districts with the worst mortality results, shorter life span and high number of PYLL, in all causes.
The data referring to income inequality and illiteracy and their relationship with health outcomes were consistent with studies already carried out in Brazil and other countries in the world. Pickett and Wilkinson 28 performed a literature review to assess the causality between differences in income and health outcomes, showing the strong relationship between them.
The income difference, for example, helped explain higher mortality from chronic diseases among the poorest, including cardiovascular disease and cancer 29,30 . The analysis of premature death causes at a global level showed, in turn, that the combination of socioeconomic factors such as low income and lower schooling is related to a higher number of years lost due to infectious diseases and maternal, child, and perinatal causes 26 .
Income and schooling are combined again to determine higher mortality from homicides among the most impoverished population, in a Brazilian capital 31 . At the same time, inequalities in the educational level, alone, were responsible for the difference in mortality from external causes and chronic non-communicable diseases in other countries 32,33 .
As presented in this study, Groenewald et al. 25 analyzed premature mortality and socioeconomic factors at the intramunicipal level and found that the poorest sub-districts of Cape Town, South Africa, had the worst results of premature mortality, for the same three groups of causes analyzed here. The authors affirm that "the significant differences in mortality levels across the city highlight the importance of sub-district level information" 25 . Information on premature mortality should, therefore, facilitate the identification of public health priorities.
Besides the influences of income, schooling, occupation, and other determinants that can be measured at the individual level on people's health, studies that show the influence of the neighborhood on health results have gained relevance 10 . If Krieger et al. 11 pointed out the potential of using district-based social class measures in public health actions, notably in health surveillance, the research that followed contributed to the understanding of district effects on health, regardless of individual variables 13,34 .
Aggregate income data were used by Mode et al. 13 as a variable at the district level in a U.S. city, in a section smaller than a district (census tracts). These authors showed that the median household income evidenced the same results on mortality as an index measure that aggregated 19 independent variables. The mean household income per capita variable correlated with the mortality outcomes also among the districts of Natal, albeit with less magnitude than literacy and the proportion of low-income households.
When analyzing population census data at the aggregate level in the U.S., Muller 35 highlighted the educational level as the strongest predictor of differences in mortality. The proportion of women over 15 years of age with elementary education was used by Coelho and Dias 36 as a schooling variable to test its association with life expectancy among Brazilian municipalities. Moderate correlations were found between this variable and mortality among the districts of Natal using the proportion of literate as a measure of schooling, an alternative indicator to the years of study 37 .
The highest correlations, however, involved the variable proportion of low-income households. This indicator reveals the low income of the district, with lower influence of the income extremes compared with the mean household income per capita indicator, as it classifies and quantifies the households with income below a specified value. A high proportion of households in this condition may suggest poor subsistence conditions 37 .
The low income of households is even one of the determinants used in the definition of slums of the United Nations, which are regarded as a wide variety of low-income settlements or with poor living conditions 38 . The relationship between district effects and health outcomes was analyzed in longitudinal studies and also in a meta-analysis, all of which showed that the socioeconomic status measured at the aggregate level affects mortality and perceived health, regardless of individual factors 9,13,34 .
Meijer et al. 9 conducted a meta-analysis to try to overcome concerns about the socioeconomic effects of the district on health, whose outcomes were general mortality and the incidence of cancer. In contrast, studies should control the models by at least an individual socioeconomic status indicator. The relative risk for all-cause mortality was more significant for the inhabitants of areas with worse socioeconomic conditions than those who lived in areas with better conditions. The effects were also controlled by individual socioeconomic status. While not controlled by variables at the individual level, the mortality results among Natal's districts also pointed to higher mortality for the areas with the worst socioeconomic indicators.
Chronic non-communicable diseases (NCD) have increased their share in the causes of death, holding a prominent place in the global burden of diseases that once was held by communicable diseases. This change in the characteristics of the leading causes of death would not, in itself, be a reason for the close relationship between socioeconomic conditions and PYLL due to NCDs.
The effects of socioeconomic conditions on health between locations were analyzed by Roux and Mair 10 , who highlighted that these influences are linked to the determinants of chronic diseases. Places with worse economic indicators tend to negatively affect the offering of spaces for physical activity and the use of available spaces. Moreover, people's choice of diet is influenced by the availability of markets that offer healthy or unhealthy foods, where the most degraded places tend to have fewer stores with a good variety of healthy foods. The district can also influence a higher or lower risk for obesity, diabetes, and hypertension, as well as depression and other mental disorders 39 .
Additionally, population aging has occurred at an accelerated rate in developing countries, and the mortality due to these causes mainly penalizes individuals who are under poor living conditions 39 . Besides experiencing this rapid change in demographic and epidemiological characteristics, Bollyky et al. 40 reveal that lowand middle-income countries are less prepared to deal with these changes.
The mean PYLL per group of communicable diseases, maternal, child, and perinatal causes also showed statistical significance in the correlation with socioeconomic variables, although the strength of the relationship was lower than the other groups. Campbell and Campbell 39 point to a reduced share of communicable diseases in the burden of morbimortality among developing countries in recent decades. Notwithstanding this, the data point to a mortality differential for this set of causes among the population groups in the city of Natal.
The Health Inequity Index showed variations in the territory of the city, with the worst situations located in the west and north of Natal. The areas corresponding to the southern and eastern administrative zones showed an opposite pattern, with the HII suggesting a lower effect of inequity on the resident population.
Indeed, the history of the occupation of the territory of Natal offers part of the evidence that explains the event. Its occupation has characteristics common to the Brazilian urbanization process 41 and to developing countries 39 . The growth towards the suburbs generates a reduced coverage of physical and social infrastructure, including primary and secondary health care services, roads, basic sanitation, among others 39 .
Recent studies that measured social vulnerability in the municipality of Natal showed that the most vulnerable areas of the city are located in the suburbs, notably in the western and northern administrative zones, in a pattern similar to that observed with the HII 42,43 . Understanding the occupation of urban space in cities helps to explain socio-spatial segregation, which will ultimately have an impact on health.
Once it is recognized that areas with worse socioeconomic indicators have higher mortality than areas with better conditions, the health sector and other areas that manage public policies should look at areas with fewer inhabitants, since they have a more significant association with mortality 9 . Census tract analysis within districts illustrates the importance of analysis at this territorial level. Even within a defined area like the district, different realities are seen between the smaller space units. Krieger et al. 11 point out that census tracts seek to aggregate relatively homogeneous populations concerning social and economic characteristics, and that this fact reveals pockets of poverty and wealth, which would otherwise be hidden.
The HII allowed the identification of health inequities identified in the capital Natal, but it can be calculated for other Brazilian urban centers, indicating how public policies can reduce the deep-seated inequalities in each territory. Improving health surveillance actions is one of the relevant sectoral initiatives, but the mortality data must be more carefully recorded to achieve this.
It can be seen that the districts with the worst socioeconomic indicators had higher PYLL means in all three groups of causes studied.
These data suggest that the suburban areas of the city experience the triple burden of diseases: they have higher early mortality for the group of causes of infectious and parasitic diseases, as well as maternal, child and perinatal causes; experience premature death from chronic non-communicable diseases; and suffer from external causes. This last group of causes, however, stood out due to the high PYLL mean suburban districts, suggesting that violence in these areas requires priority actions aimed at tackling it.
The establishment of HII sought to consider all these variables in a single indicator, providing elements for formulators of sectoral and intersectoral health policies, given that it includes social stratifying factors such as income and schooling.
Despite this fact, the data presented in the study serve to point out to policymakers the ways to minimize the effects of social inequities on health, prioritizing the areas with the worst rates. It is also worth mentioning the possibility of other capitals and urban centers calculating the HII, identifying health inequities in their territories from the socioeconomic and mortality data already available.

Collaborations
MS Mata participated in the work conception and design, carried out the analysis and interpretation of the data, as well as the drafting of the paper. ICC Costa participated in the work conception, its critical review, and the approval of the version to be published.