Mortality due to garbage codes in Brazilian municipalities: differences in rate estimates by the direct and Bayesian methods from 2015 to 2017

REV BRAS EPIDEMIOL 2021; 24: E210003.SUPL.1 ABSTRACT: Objective: To generate estimates of mortality rates due to garbage codes (GC) for Brazilian municipalities by comparing the direct and the Bayesian methods, based on deaths registered in the Mortality Information System (SIM) between 2015 and 2017. Methods: Data from the SIM were used. The analysis was performed in groups of GC levels 1 and 2, levels 3 and 4, and total GC. Mortality rates were estimated directly and also according to the Bayesian method by applying the Empirical Bayesian Estimator. Results: About 38% of GC were estimated and regional differences in mortality rates were observed, higher in the Northeast and Southeast and lower in the South and Midwest regions. The Southeast presented similar rates for the two analyzed groups of GC. The smallest differences between direct and Bayesian method estimates were observed in large cities with a population over 500 thousand inhabitants. Municipalities in the north of the state of Minas Gerais and those in the states of Rio de Janeiro, São Paulo, and Bahia presented high rates at levels 1 and 2. Conclusion: There are differences in the quality of the definition of the underlying causes of death, even with the use of Bayesian methodology, which assists in smoothing the rates. The quality of the definition of causes of death is important, as they are associated with the access to and quality of healthcare services and support health planning.


INTRODUCTION
In health planning, the availability of good-quality data is essential. Mortality data are one of the best known and most used, as they enable to know the health condition by age, sex, place of residence, and causes of death 1 . Based on these data, it is possible to identify populations at risk, more frequent causes of death, premature or preventable deaths, temporal trends and, thus, define priorities and interventions. However, for the generated information to adequately support the planning and direction of healthcare actions, with a consequent impact on the population's health profile, it is essential for the databases to have good coverage and quality 1 . In this sense, the Brazilian Ministry of Health carried out actions to improve the information on national vital statistics. They are practical for: • improving the capture of deaths by the Mortality Information System (Sistema de Informações sobre Mortalidade -SIM), such as investments in the training of healthcare teams, expansion of codification related to causes of death, search and legalization of clandestine cemeteries, awareness of managers regarding the underreport of deaths, expansion of family health teams, and hiring doctors for the inland of the country via Programa Mais Médicos para o Brasil (More Doctors Program), among others 2,3 ; • reducing deaths from ill-defined causes (chapter 18 of the 10 th edition of the International Statistical Classification of Diseases and Related Health Problems -ICD-10) and from other garbage codes (GC) 4 and groups due to underlying cause of death (UCD) deemed incorrect or nonspecific such as UCD declared as sepsis and cardiac arrest. Thus, the group of causes called "GC" is considered an indicator of the quality of health information. The lower the incidence, the better the quality of these data [4][5][6] .
The magnitude of GC has been analyzed according to regions, states, capitals, and groups of municipalities 4,7 . Knowing its distribution at the municipal level is important for planning local actions and reducing the occurrence of deaths from these causes. Studies from the Global Burden of Disease (GBD) have pointed out that poorer countries and locations have worse health indicators and lower quality of databases as well as a higher proportion of GC 8 . Also in Brazil, the Busca Ativa de Óbitos study (Proactive Search of Deaths) has already identified worse SIM quality in small municipalities of the North and Northeast regions of the country 9 .
It is assumed that in smaller municipalities and in poorer regions the proportion of GC is higher. Considering that, of the 5,570 municipalities in the country, 88% had a population of less than 50 thousand inhabitants in 2010 10 , it is a great challenge to directly calculate mortality rates in these locations due to instability and great variability in the estimates. Thus, as an alternative, some authors have been using Bayesian methods, such as the Empirical Bayesian Estimator, in order to estimate mortality rates in municipalities 11,12 . Therefore, this study aimed at generating estimates of mortality rates due to GC for Brazilian municipalities by comparing the direct and the Bayesian methods, based on deaths registered in SIM between 2015 and 2017.

METHODS
Descriptive study using public data from SIM from 2015 to 2017. The analysis considered the municipality of residence and, in order to minimize fluctuations in the number of deaths at the municipal level, where small numbers generate high variability in the rates, it was decided to work with the three-year period.
The selection of GC was based on the GBD 2017 study 13 , which classified the defined causes into three major groups: communicable, maternal, neonatal, and nutritional; non-communicable diseases (NCD); and external causes. In addition to these groups, the GBD study defines four levels of GC, with level 1 being the worst scenario in relation to the quality of the definition of causes in this order. They are as follows 14 : • Level 1: GCs can be redistributed to any of the large groups of defined causes in the GBD study, as aforementioned. For instance, a GC can be coded as sepsis by any group, which can result from a death from transport accident, from an infectious disease, such as pneumonia, or from a chronic disease such as cancer; • Level 2: GCs are redistributed to a large group, or at most to a second group (for instance, UCD defined as gastrointestinal bleeding, unspecified, should be redistributed only to the group of non-communicable diseases); • Level 3: They refer to causes that are likely to be in the same chapter as ICD-10. For example: unspecified cancer, although requiring greater specificity of type or organ, this GC will be attributed to the disease and redistributed to the same group of specific causes of cancer; • Level 4: The UCD probably refers to a single disease, such as unspecified stroke, which may be ischemic or hemorrhagic; diabetes, which can be redistributed as type I or type II.
Thus, it is considered that GC levels 1 and 2 are the most problematic and may have greater impact on the quality of statistics on causes of death, as they contain little information on the actual UCD for being highly nonspecific 13,14 . For the present study, the analyses were made for the total GC and groups of levels 1 and 2 and 3 and 4.
For the purposes of a more adequate comparison, the limitations of the local quality of UCD were considered when using the SIM at the municipal level as a treatment for improving the quality of data to level the limitations. The treatment of missing data consisted in step 1. A proportional redistribution was applied in the case of missing data on age, sex, or municipality of residence, and the last variable comprised the Federative Unit (FU) 2 .
Correction was applied for unregistered deaths, i.e., correction of underreporting, taking into account the heterogeneity of SIM coverage in the country 15 . GBD 2017 correction was used according to sex, age, and FU. The correction coefficient was generated by the ratio between deaths estimated by the GBD and those observed in the SIM according to states, but it was applied in the municipalities with a general mortality rate of less than five deaths per 100 thousand inhabitants, avoiding overestimations in municipalities whose death rate was classified as of good quality 16 . Municipalities whose value was less than 1 were not corrected.
The estimates of mortality rates were prepared with the Empirical Bayesian Estimator (EBE) 12,[17][18][19] . This methodology considers the neighborhood distribution, allowing to minimize the effects caused by the small numbers in the denominator analyzed in small populations. In these municipalities, one death can considerably impact the mortality rates estimated in the localities. In addition, EBE allows estimating rates in places without death incidence, which enables to calculate risks in which the observed event is zero, through information from neighboring regions. In this study, the distribution of the eight closest neighbors of the evaluated municipalities was considered.
Similar to the direct method, for the calculation of mortality rates, Bayesian estimators have as parameters the number of deaths and the population; however, it is known that advanced ages have high mortality rates due to GC. For minimizing the effect of the age distribution of the municipal population, the standardized rates for age were calculated, considering the standard population of the 2010 Census 20 and using the absolute values of expected deaths. Inhabitants of the municipalities, according to sex and age, were estimated according to the demographic cohort-component method for population projections, with an empirical Bayesian contraction estimator, to minimize the instability in the estimates of differential factors of the method growth in smaller areas 21 . Thus, the rates estimated by the direct method (crude rate) and the Bayesian method (Bayesian rate) used the expected number of deaths considering the municipal standardized rates and the respective population, during the three-year period from 2015 to 2017.
Municipal descriptive analyses for GC, total, and groups were generated according to regions of Brazil. Histograms were used to show the absolute differences between the rates estimated by the Bayesian method, with the use of EBE, and the direct method, division of the expected number of deaths and population, in the municipalities and according to population size. Finally, maps were prepared containing the spatial distributions of Brazilian municipalities with the estimated Bayesian mortality rates.
The Among the GC subgroups, GC levels 1 and 2 in the Southeast stand out, with 125.7 (123.8; 127.5), for having presented a rate value similar to that of levels 3 and 4, with 128.6 (127.7; 129.5). It was also verified that the means of rates estimated by the Bayesian method reached higher values than those estimated by the direct method, but with less variability, as the standard deviation and amplitude were lower (descriptive not shown in Table 1). The states with the lowest mortality rates for GC per region were: Amapá (177.1), Rio Grande do Norte (224.3), Espírito Santo (169.3), Rio Grande do Sul (176.8), and the Federal District (140.5), followed by North, Northeast, Southeast, Midwest, and South regions, which are not in the table. Figure 1 shows the means and 95% confidence intervals of the differences in Bayesian and direct rates in Brazil and its regions. The Midwest, the North, and the South regions had the highest means. GC levels 3 and 4 showed the greatest mean differences. Figure 2 shows the histograms of the direct and Bayesian mortality rates by GC and groups. The decrease in the variability of mortality rates after the correction is noteworthy, considering that the frequency is more concentrated in the center of the histogram for this indicator. Figure 3 indicates the histograms of the absolute differences between the Bayesian rates and the crude rates for the total GC and groups according to Brazilian regions, in addition to differentiating the size of the municipalities by color. Large cities have differences close to zero, especially those with a population over 500 thousand inhabitants. Conversely, smaller municipalities have distributions throughout the x-axis, i.e., a more heterogeneous distribution. There is a large volume of municipalities with a population of less than 10 thousand inhabitants in the South of the country, which stand out with values that are more distant from zero, especially for this region. In the Northeast and Southeast regions, high frequencies above zero are observed in cities with 10 to 50 thousand inhabitants for the differences in the rates of total GC. The municipal analysis of the geographic distributions of mortality rates by total GC and investigated subgroups is shown in Figure 1. In the caption it is possible to observe that the darker the color, the higher the mortality rate in the municipality. When analyzing the total GC ( Figure 1C), the highest concentration of dark colors in the Southeast region is highlighted, mainly in the north of Minas Gerais and in the states of Rio de Janeiro and São Paulo, and in the Northeast region, mainly in the state of Bahia. On the other hand, the state of Espírito Santo and the South and Midwest regions stand out for the presence of lighter colors, i.e., lower rates. When observing the map of GC rates concerning levels 1 and 2, a geographic distribution similar to that of total GC is verified ( Figure 1A).
Finally, when analyzing the distribution of mortality rates of GC levels 3 and 4 ( Figure  1B), it is possible to observe a random distribution throughout the country, with no visual patterns being identified, not even in Espírito Santo, which stood out for having presented a standard of lowest rates for total GC and GC levels 1 and 2.

DISCUSSION
The results of the present study highlight the high proportion of GC in the country, with more than a third of deaths thus classified, being distributed with 12.9, 4.5, 4.1, and GC: garbage codes. 17.1% among the respective levels of 1 to 4 in the three-year period from 2015 to 2017. The Northeast and Southeast regions had the highest Bayesian death rates from total GC, and the lowest rates of total GC were observed in the South and Midwest regions. Smaller municipalities concentrate higher rates of GC. The heterogeneity of the quality of mortality data in the period from 2015 to 2017, considering GC as quality indicators, is evident.
The importance of correcting and using the EBE method to calculate mortality rates for GC in Brazilian municipalities is emphasized. These results may be related to difficulties of GC: garbage codes. access to health and the scarcity of resources in health care, including the quality of provided services and diagnoses, as these are factors that negatively interfere in the accuracy of the definition of UCD 22 . Furthermore, the use of EBE for small areas minimizes fluctuations, considering the observed regional realities, as the neighboring municipalities are taken into account to calculate the estimates of final rates.
GC: garbage codes. Although SIM has been considered as a source of good-quality data in recent years, this characteristic is regionally differentiated. In addition, the analysis of mortality rate in small populations can generate high variability, as small numbers can considerably change the mortality rate. Thus, some methodological aspects of the present study should be highlighted.
The treatment of the raw SIM data and the methodology used to work with small areas allowed the analysis of mortality rates, minimizing the random fluctuations in the spatial distribution of rates between municipalities. First, the redistribution of missing data and the treatment of SIM underreporting of deaths brought the level of quality of raw data closer to the municipal level. As the use of correction coefficients developed for states in GBD studies 13 can generate values that do not correspond to the municipal reality, corrections were applied in this study only to cities where the overall mortality rates were considered lower than expected 16 . In addition, the effect of differences in age distribution was removed by using age-standardized rates. Finally, the use of EBE, when considering information from neighbors in the rate estimates, enabled estimates without random spatial changes.
Smoothing in the estimates of mortality rates by the Bayesian method, which takes into account the neighboring municipalities to generate the estimates, applied in this study, proved to be adequate, considering the heterogeneity in the quality of the mortality data and the large number of small cities in Brazil 10 .
As expected, Bayesian estimates showed less variability than the direct ones in the methodology for calculating rates, as in smaller municipalities significant correction is expected due to the weight of larger neighboring cities 19 . Furthermore, when dealing with small numbers, in the direct method it was observed the nonoccurrence of deaths from GC, which means that the risk of mortality equals zero. This may not be in line with the local reality, considering that fluctuations caused by small numbers can interfere with rate estimates with the use of direct methods 12 .
Based on estimates that use the values of the neighboring municipalities for the size of the population, greater differences between the rates estimated by the Bayesian and direct methods were observed, especially in smaller municipalities. This can be verified in Figure  3, in which there is considerable frequency of the difference between the rates of the direct and Bayesian methods far from zero in the municipalities with smaller population sizes.
The analysis per regions also enabled to verify that the quality of the mortality data, considering total GC, overall, presents consubstantial regional differences. There were lower rates in the South and Midwest compared with other regions. These results corroborate the hypothesis that more developed areas have better quality in the most appropriate definition of causes of death. Access to health and the quality of these services are factors that can contribute to the better definition of UCD 23,24 .
An unexpected finding was that the Southeast region had the second highest Bayesian mortality rate due to total GC, only lower than that of the Northeast. Moreover, it was noted that the rates of GC groups levels 1 and 2 and 3 and 4 showed very close values. The GC levels 1 and 2 group represents deaths with little information for an adequate definition of UCD. In this group, according to GBD 2017 13 , there are deaths to be redistributed REV BRAS EPIDEMIOL 2021; 24: E210003.SUPL.1 among all defined causes such as, for example, R98 -unattended death -and R99 -other ill-defined and unspecified causes of mortality. Causes like these do not provide information to support health management, as they do not enable to target preventive actions for them. Considering that the Southeast region is one of the most economically developed in the country 25 , with higher number of larger municipalities and, therefore, with better access to healthcare services on the part of the population 26 , together with the South region, a hypothesis to be raised is that the high population density is preventing a more accurate diagnosis for the definition of UCD.
Regional differences can affect the comparability of mortality indicators for specific groups of causes of death that are incorrectly classified as GC. Such differences may still be due to the different types of GC, because, depending on the location, there may be a predominance of GC more related to the group of communicable diseases, chronic non-communicable diseases, or external causes 27 . In addition, it is very likely that, depending on the FU or municipality, there will be variations in the certification and coding of the causes of death 28 .
According to the GBD 2017 study 13  Despite the improvement already observed in the quality of mortality data 6,29 , the analysis per municipalities shows that mortality rates by GC are heterogeneous in the country. This analysis (Figure 4) shows spatial distribution containing groups of regionalized cities and points out areas with high rates, demonstrating intraregional inequalities. Municipalities with the highest rates were concentrated in northern Minas Gerais and southern Bahia, areas that presented groups of municipalities with values classified in the same category as the range of causes by total GC, represented by a darker color. These areas are characterized as of poor socioeconomic development 30 . Possibly, these regions, in addition to being more distant from the capitals, which are references, are located in areas distant from regional healthcare centers, which increases the chance of nonspecific diagnoses 29 . On the contrary, municipalities in the South region, where places with higher socioeconomic development and high supply/complexity of healthcare services are concentrated, showed lower rates, represented by a lighter color 30 .
In the analysis by types of GC ( Figures 4A and 4B), it is verified that GC levels 3 and 4 have a heterogeneous distribution throughout the country, whereas GC levels 1 and 2 (the most serious ones) have few, but important points with high rates in the South and Midwest regions. These findings enable to consider the need to locally prioritize a plan aiming at reducing deaths certified as GC such as better access to healthcare services and diagnosis and the improvement of death surveillance.
Although there was no correlation analysis, visually, it is clear that other factors may be related to the generated clusters of municipalities such as the coverage of Programa Saúde da Família (Family Health Program), socioeconomic level, among others. This suggests an in-depth study, aiming to evaluate the relationship between the findings and the aforementioned characteristics of these places. To better understand this situation, it is suggested to use more analytical methodologies, using different data sources to identify significantly associated factors.
GC: garbage codes. The results of this study show differences observed in the quality of the definition of UCD with the use of an adequate methodology for the analysis of small areas. When investigating differences between the crude and estimated rates, with the aid of the Bayesian methodology, it was possible to verify that the situation of GC in Brazil, even with methodology for smoothing it, is still regionally differentiated. The quality of the definition of UCD is extremely important for public health, considering that it is associated with the access to and quality of healthcare services and supports health planning. Therefore, an analysis of small areas is very important for the actors responsible for health management in the country.