Incomplete recording of race/colour in health information systems in Brazil: time trend, 2009-2018

This ecological study of time trends and multiple groups evaluated incompleteness in the race/colour field of Brazilian health information system records and the related time trend, 2009-2018, for the diseases and disorders most prevalent in the black population. The Romero and Cunha (2006) classification was applied in order to examine incompleteness using secondary data from Brazil’s National Notifiable Diseases System, Hospital Information System and Mortality Information System, by administrative regions of Brazil, while percentage underreporting and time trend were calculated using simple linear regression models with Prais-Winsten correction (p-value<0.05). All records scored poorly except those for mortality from external causes (excellent), tuberculosis (good) and infant mortality (fair). An overall downward trend was observed in percentage incompleteness. Analysis by region found highest mean incompleteness in the North (30.5%), Northeast (33.3%) and Midwest (33.0%) regions. The Southeast and Northeast regions showed the strongest downward trends. The findings intended to increase visibility on the implications of the race/color field for health equity.


Introduction
Race/colour has been described as a distinct exposure factor for risk of illness and death and a significant indicator of social inequalities 1 .In the health context, it is an indispensable variable with incalculable strategic significance, because it allows epidemiological profiling of ethnic-racial groups and highlights institutionalised racism in health practices and services 2 .
Official Brazilian race/colour statistics use the categories white, black, brown and yellow and indigenous 3 .Use of race/colour as a variable in health information systems is complicated by series of political, ideological and social factors 4 .How such information is captured, processed and used depends on the social and political circumstances in which it is produced 5 .In this study, the expression "race/colour" is used in view of conceptual issues and ethnic-racial classification in Brazil.
In semantic terms, the category "race" is assimilated as a social construct that enables identities to be determined and resources, goods and social values to be accessed 6 .Colour is considered the primary trait of ethnic-racial differences and a substantial predictor of many forms of perceived discrimination 7 .In Brazil, race/colour is acknowledged to be a key analytical category, which operates as a marker of discrimination 7,8 and social inequality 9 .
The use of the race/colour variable as a tool for analysing social stratification reflects the centrality of fragmentation in Western societies 8 .The concept of race/colour is socially dynamic, complex and multidimensional, conditioned by variations in time, place and context 10 and is particularly important to decision-making, the implementation of racial equality policies and to combating discrimination and institutional racism 11 .
Ethnic-racial inequities have been studied increasingly in the past twenty years 7 .These inequities reflect historical and permanent social inequities 12 .From that perspective, it is of capital importance to collect data and conduct scientific social research focused on racial inequalities 13 .Accordingly, not using the race/colour variable impedes the production of important information for identifying and monitoring health conditions, vulnerabilities and ethnic-racial inequities 2 .Moreover, failure to collect data serves to maintain the status quo and raises barriers to understanding and addressing inequalities at the institutional level 13 .
Brazil's national health information and information technology policy states that the use of race or ethnicity in information production is of substantive importance to reducing health inequalities 14 .Accordingly, race/colour has become a required variable in the country's health information systems (HISs).Mandatory inclusion of the race/colour variable in HISs and in research forms dates from the 1990s and is framed in important legal documents, including National Health Council Resolution 196/1996 15 , the National Plan for the Promotion of Racial Equality 16 , the National Policy for the Comprehensive Health of the Black Population (Política Nacional de Saúde Integral da População Negra -PNSIPN) 17 and Ministerial Order No. 344 of February 1, 2017 18 .Black activists and researchers have made incisive contributions to this whole legal framework, which rests on recognition of the health repercussions of racism and the importance of producing information by ethnic-racial criteria in order to promote health equity and reduce social inequities 19 .
Analysis of epidemiological data disaggregated by race/colour reveals the effects of racism and racial inequities on the black population's health 20 .This population's socioeconomic vulnerability, which advocates of racial democracy, Marxists and social epidemiologists mistakenly associate with non-racial, social production, results from an intergenerational, historical legacy of low human capital imposed on this population 21 .
The HISs provide management support in that they include data, information, knowledge, communication and action, which make it possible to understand the phenomenon and to take strategic, communicative action to promote social inclusion 22 .Action to tackle the social determinants of health will be more effective if HISs are properly implemented and if mechanisms are set up to ensure that quality information is used in developing public policies 23 .To that end, historical records provided by HISs permit research, characterised methodologically as time series, to examine trends and forecast medium-and longterm events 24 .
One way to evaluate these records and gauge the quality of their data is by the completeness and incompleteness of race/colour information, corresponding respectively to the degree of completeness and non-completion of the record field in question 25 .In information systems, properly completed fields underpin the construction of quality databases, offering the possibility of stratification and close analysis of reality 26 .
Considering the centrality of the ethnic-racial dimension in the Brazilian context and the importance of recording race/colour for monitoring health inequalities 27 , this study evaluated the incompleteness and time trend of incompleteness in recording race/colour on SIS records, from 2009 to 2018, for the diseases and disorders most prevalent in the black population.

Material and methods
For this mixed (time trend and multiple group) ecological study, baseline year was chosen as 2009, corresponding to official introduction of the PNSIPN.The study used secondary data relating to diseases and disorders listed by the PNSIPN as most prevalent in the black population and included in Brazil's national notifiable disease system (Sistema Nacional de Agravos de Notificação -SINAN), hospital information system (Sistema de Informações Hospitalares -SIH) and mortality information system (Sistema de Informações sobre Mortalidade -SIM) (Chart 1), for Brazil and Brazilian regions, from 2009 to 2018.
Data were collected from the website of the Unified Health System (Sistema Único de Saúde, SUS) information technology department (DATASUS) and all data relating to diseases and disorders were collected by place of residence.Data analysis by region contemplated all diseases and disorders, which were treated individually.The extracted data were exported to an electronic spreadsheet in Excel, version 16.0.
Percentage incompleteness was given by the ratio between instances of race/colour fields with unknown/blank/no information and the total number of records in the year.Scoring was as proposed by Romero and Cunha 28 and widely used in studies for this purpose 29 , where non-completeness of the race/colour field is classified as: excellent (≤5%); good (5<10%); regular (10<20%); poor (20<50%); and very poor (≥50%).Absolute, relative and average frequencies and respective standard deviations (mean ± standard deviation) were calculated.
In the time trend analysis, the aggregate measures estimated as a function of time were the percentage values for completeness of the race/ colour field in records of diseases and disorders most prevalent in the black population, by year.The time series trend was analysed using a simple linear regression model with Prais-Winsten correction 30 .Analysis of time trend and annual percentage change (APC) considered results with p-value <0.05 to be statistically significant; APC>0 and p<0.05 to denote a significant increasing trend; APC<0 and p<0.05, a significant decreasing trend; and p>0.05, no trend, according to the model adopted.The independent variables were years and the dependent variable, percentage completeness of race/colour, by disease or condition.Time series increasing or decreasing trend behaviour was estimated by calculating annual percentage variation (APC) and respective 95% confidence intervals (95%CIs).The CI was constructed using the quantile of the Student-t distribution at n-2 degrees of freedom.The analyses were carried out in the programming language known in the literature as R 31 , version 3.6.3for Windows.

Results
During the study period, 7,734,245 case records of the diseases and disorders most prevalent in the black population were found, of which 1,732,300 (22.3%) returned, for the race/colour variable, data unknown, blank or no information.Overall, average incompleteness in relation to race/colour, for all diseases, deaths and disorders recorded in Brazil, was approximately 24.4±10.9%.By HIS, average incompleteness was 22.5±15.1% in the SINAN; 29.2±4.6% in the SIH; and 7.3±3.3% in the SIM.
The diseases and disorders with lowest average percentage incompleteness of race/colour information in Brazil were death from external causes (4.3±1.3%),tuberculosis (8.1±1%) and infant mortality (10.2±1.8%);On the other hand, the highest average incompleteness for race/colour was observed in records for HIV/AIDS (37.0 ± 6.4%), Diabetes Mellitus (32.8±3.8%) and Other anaemias (31.9±4.8%)(Figure 1).Against the pre-established evaluation criteria 28 , all records of diseases and disorders analysed presented a poor score for race/colour, with the exception of records of mortality from external causes (excellent), tuberculosis (good) and infant mortality (fair) (Chart 1).
Data analysis, by race/colour, for all diseases and disorders, by region, found lowest average incompleteness in the South (17.0%) and Southeast (19.8%) and highest in the North (30.5%),Northeast (33.3%) and Mid-West (33.0%).
The nationwide trend in incompleteness of race/colour information was decreasing for all diseases and disorders, except HIV/AIDS (p-val-ue=0.0203),which rose from 34.4% in 2009 to 49.9% in 2018.All time series for genetically determined and acquired diseases and disorders resulting from unfavourable conditions were statistically significant, with the exception of the series for mental and behavioural disorders due to the use of other substances (mental disorders I), mental and behavioural disorders due to alcohol use (mental disorders II) and infant mortality, which, on the model adopted here, showed no significant trend.The largest downward trend in average APC was observed in mortality from external causes (-10.1%)(p-value=0.0125),from 5.3% in 2009 to 2.0% in 2018 (Figure 1).The administrative regions that returned the largest decreasing trends in incompleteness were the Southeast and Northeast.
The North region returned upward trends in incompleteness of records for mental disorders I (from 40.0% in 2009 to 52.7% in 2018), mental disorders II (31.4% in 2009, 58.4% in 2014 and 32.7% in 2018), HIV/Aids (from 39.4% to 56.4%) and tuberculosis, which showed a very small increase, from 2.6% to 3.2% (p-val-ue=0.0210)(Figure 2 (p-value=0.0210).Prominent in this region were mental disorders I, with the highest average incompleteness (48.8±12.7%),and external causes, with the lowest (2.5±0.6%).The Northeast region showed decreasing trends in incompleteness for almost all diseases and disorders with the exception of HIV/ AIDS, which rose from 37.8% in 2009 to 49.0% in 2018, and tuberculosis, which returned 6.9% in 2009, 5.8% in 2013 and 7.3% in 2018 (Figure 3).Underreporting of race/colour for most morbidities was observed to decrease in the last five years of the series (2014-2018).Highest average rates of completeness were observed for renal failure (50.9±7.3%); and lowest, for tuberculosis (7.0±0.7%).
In the Southeast region, decreasing trends were found in incompleteness for all cases except HIV/AIDS (p-value=0.0030),where percentage incompleteness increased from 37.2% to 51.1% in the study period (Figure 4).Incompleteness increased for most diseases and disorders in 2011 and 2012.This increase was observed in records for iron deficiency anaemia (18.7% to  The South region showed an increasing trend in percentage underreporting in cases of iron deficiency anaemia (p=0.0007), which rose from 21.7% in 2009 to 33.2%; tuberculosis, which showed a small increase in the study period, from 2.6 % to 3.2%; and HIV/AIDS (p-value=0.0076), which increased from 25.4% to 42.0%, returning the highest APC (6.06%) (Figure 5).Highest av- erage incompleteness was observed in records for iron deficiency anaemia (28.5±3.5%) and the lowest, for external causes (1.4±0.1%).
In all regions, HIV/AIDS returned the highest percentage incompleteness and strongest increasing trend.

Discussion
In this study, average incompleteness of data on diseases and disorders was found to be poor overall, according to the classification of Romero and Cunha 28 .Lowest percentage incompleteness of the race/colour variable was found in records for mortality from external causes, tuberculosis and infant mortality.Overall, recording in the SIM tended be better due to the data relating to the finiteness of life.On the other hand, the highest percentage incompleteness related to HIV/ AIDS, diabetes mellitus and other anaemias.The lowest average incompleteness of race/colour data for diseases and disorders was found in the South and Southeast regions and the highest, in the North, Northeast and Mid-West regions.Analysis of the historical series revealed a decreasing trend in percentage incompleteness of race/colour data for the majority of diseases and disorders most prevalent in the black population in the study period, a finding that corroborates other studies 32,33 .
Calheiros 32 analysed a historical series of mortality records for cervical and part unspecified cancer, from 2000 to 2012, and found a decreasing trend in incompleteness of race/colour records in Espírito Santo, the Northeast and Brazil overall.Another study that evaluated completeness of this field in the SINAN between 2001 and 2013 found a significant reduction in underreporting from 2007 onwards.The percentage of "unknown" or "blank" fields fell from 92.3% in 2001 to 27.1% in 2013 33 .
Completion of race/colour data has been found to be poor for most diseases and disorders.Change over time in the completeness of this field for congenital syphilis cases in Minas Gerais, from 2007 to 2015, was classified as poor 34 , as it was in a study of completeness of SIM records of women with breast cancer, where data "unknown" in the race/colour field varied between 18% and 35%, which gave completeness a poor rating 35 .Also, a time trend study of completeness of the race/colour field for prostate cancer in Espírito Santo, the Southeast region and Brazil, between 2000 and 2010, found completeness rates of between 18% and 33%, resulting in a poor standard of quality 36 .Generally speaking, incompleteness and fluctuations in completeness percentages over time may relate to underqualified and understaffed human resources, insufficiencies in information technology, which hinder data flows and updating, notifiers' perceiving form filling as exclusively bureaucratic, the action being seen as dissociated from the importance of the data collected or the quality of care 37 , and lack of interest and priority on the part of the authorities 38 .
A decreasing trend in underreporting in the race/colour record field was observed in all regions, for all diseases and disorders, except for HIV/AIDS and tuberculosis.Despite this reduction, persistent under-recording was observed throughout the historical series in the North, Northeast and Mid-West regions.In Brazil, the information production cycle is partial, disorganised and unsystematic and the North, Northeast and Mid-West regions are the most affected.The most frequent initiatives in information quality management are concentrated in the South and Southeast regions 29 and, historically, these enjoy better data quality.In this regard, Romero et al. 37 calculated percentage incompleteness in recording the race/colour of elderly deaths in the SIM, by administrative regions and states, from 2000 to 2015.They observed completeness quality to improve significantly until 2006 and, from 2007 onwards, the national average was found to be excellent.In 2000, incompleteness was already less than 10% in more than half of Brazil's states, with the exception of the Northeast region.However, from 2012 onwards, completeness was good or excellent in all states.Despite the advances observed, however, that study also found differences in 2015: percentage completeness of race/colour data was excellent in 59.3% of municipalities in the Northeast region and 65.3% in the Southeast, while in the North, South and Mid-West regions, approximately 80% of municipalities showed excellent percentage completeness.The study concluded that inequality also impacts the quality of information.
Vital records of mortality from external causes and infant mortality showed low percentage incompleteness of race/colour data.One study of homicide deaths in Brazil recorded in the SIM from 2000 to 2009 found 90% completeness of the race/colour field 39 .The SIM is Brazil's oldest health information system, one of the most notified, best-known systems, with greatest coverage, quality and consistency, a source for indicators of extreme importance and sensitivity 40 and the first to include a race/colour field on its forms.
Time series analysis also highlighted a low level of underreporting of race/colour in tuberculosis cases recorded in the SINAN.That lower level in underreporting in records can be attributed to implementation of the National Tuberculosis Control Programme, whose action plan was approved in 2004.This programme al-located funding to improvements in the information system, investments in training and awards to priority municipalities that met the goals set by the Ministry of Health 41 .
HIV/AIDS was the disease with the most underreporting and the trend in percentage underreporting was increasing in all Brazilian regions.From 2009 to 2018, 404,938 cases were identified, in 148,682 of which race/colour was unknown (DATASUS, n/d).AIDS reached epidemic levels in Brazil and began to reflect social inequalities.Despite criticism of racially-focused public policies, Paixão and Lopes 11 argued that colour or racial cleavages do influence the incidence of AIDS on population groups and suggest greater investigative efforts with a view to understanding racial asymmetries and their impacts on health outcomes.
Diseases and disorders that are genetically determined or result from unfavourable conditions, demarcate, respectively, the biological component and racism as a social determinant of health.In this same direction, the recording of race/colour in relation to diseases and disorders whose progression is aggravated or treatment hampered points to the quality of care offered by health measures and services.Indeed, access to diagnostic and therapeutic services is more precarious and difficult for the black population, which consequently returns the worst progressions and prognoses 42 .Braz et al. 43 evaluated completeness of the race/colour field in eight HISs and observed that, due to unsatisfactory completeness, from most of the data analysed, it was impossible to validate three of the 24 indicators used by the SUS Performance Index.That study highlighted how important recording of race/colour was to construction of the indicators that measure the SUS's performance in promoting equity in care for ethnic and racial groups.
Given the conceptual complexity of ethnic and racial issues in Brazil, it is important to ask how the social and political uses of race information influence how it is collected.Work processes, data capture methods, existing racial classifications in Brazil, the characterisation of population groups and their conditions of life and the political and institutional conjuncture guiding action must all be considered 5 .Data collection should be systematic and methodologically sound and requires that health practitioners be properly trained.Araújo et al. 44 , who aimed to ascertain health practitioners' knowledge of race/ colour classification, found deficiencies in the professional training process as regards record-ing the variable, as well as recording performed indiscriminately on the basis of individual initiatives and inadequate knowledge of ethnic and racial relations.Interviewees reported the need for training and implementation of institutional measures, recommended including this topic in undergraduate and professional courses in the health field and suggested national campaigns on the importance of the race/colour variable to health care and of research to addresses the issue.
From this perspective, one major concern regarding incorrect, inconsistent or non-completeness in HISs is the risk of research bias and distortion of information and the actual state of things.Research bias, which can result from negligence in completeness or failure to recognise the importance of completeness in information systems, makes it impossible to construct indicators that reflect morbidity and mortality profiles by race/colour or to implement effective measures to monitor and reduce ethnic and racial disparities in health 26 .Werneck 20 argued that health authorities have not used race/colour data to produce information, planning and decision-making, in non-compliance with the PNSIPN 19 .
Faulty recording of the race/colour variable in information systems and its underuse by managers and health authorities are historical flaws.In relation to the COVID-19 pandemic, race/ colour data were produced, systematised and circulated only after public health practitioners, researchers and associations took action and, even then, unsatisfactorily in some systems.The poor quality of data on morbidity and mortality from COVID-19 in the black population, as with the morbidity and mortality included in that study, suggests institutional racism and public authorities' disregard for equity 45 and implementation of the PNSIPN.
The PNSIPN is designed to compensate for historical racial discrimination 46 and proposes that management tools be used to produce data and information, by race/colour, considering the specificities of health-disease processes in the black population 47 .However, adherence to the PNSIPN in managing the SUS has proven to be insufficient 20 .
The study reported here included various different diseases and disorders and health information systems.The discrepancies in completing data fields in relation to diseases and illnesses in different systems implicate various different services, contexts and processes involved in the data collection, classification and racial identification system.International recommendations advocate self-attribution of race/colour.In Brazil, racial identification is established by self-attribution and hetero-attribution 3 .In some contexts, for example, death certificates, subjects' race/colour will be hetero-attributed; in outpatient systems, self-attributed.The fact is that data collection, classification and racial identification systems linked to interviewer training and current racial ideologies will influence completeness.However, studies show that this influence is not that significant 48 .
The limitations of this study are inherent to the use of secondary data in epidemiological studies: information reliability, which may be impaired by deficient collection; errors resulting from typing and recording; coverage gaps; and losses in data transmission.Many factors influence data quality and information production.Health information production has not been incorporated into health management processes in Brazil and, as regards the race/colour variable, the process is even more complex.

Final remarks
Time trend analysis of incompleteness in recording of race/colour data in HISs expands the understanding of morbidity and mortality and assists in monitoring events in time and place and in measuring ethnic-racial inequalities.This study found poor percentage completeness and more pronounced incompleteness in relation to certain diseases and disorders and regions, although a decreasing trend was noted in percentage incompleteness of race/colour recording in relation to most diseases and disorders.
The results presented will contribute to increasing visibility of the consequences of incompleteness and the implications of filling in the race/color field for health equity.Achieving improved record quality or maximum completeness is possible in the medium term, but requires joint efforts by practitioners and managers.In this regard, attention is drawn to the legal responsibilities and attributions of managers, practitioners and workers in this process and to accountability for non-completeness and non-compliance with the guidelines set out in the PNSIPN.
Therefore, theoretical and practical initiatives are necessary, basically policies providing for inclusion of the race/colour variable in all HISs, investment in professional training to ensure systematic collection processes and data generation to acceptable standards, production of longitudinal studies based on disaggregated analyses, so that all health events are racially dimensioned and inequities highlighted.These actions constitute strategies for combating institutional racism and implementing the PNSIPN, which stipulates mandatory recording of race/colour variable and the promotion of health equity.

Collaborations
IM Souza contributed to the study conception and design, data analysis and interpretation and drafting of the article.AM Silva Filho contributed to data analysis and interpretation and critical review.EM Araújo cooperated with the data analysis and interpretation and critical review and approved the version for publication.

Figure 1 .
Figure 1.Time trend of incompleteness of race/colour records for diseases and disorders most prevalent in the black population, Brazil, 2009-2018.

Figure 2 .
Figure 2. Time trend of incompleteness of race/colour records for diseases and disorders most prevalent in the black population, North region, Brazil, 2009-2018.
Percentage incompleteness of records of the race/colour variable in relation to diseases and disorders most prevalent in the black population in health information systems, Brazil, 2009-2018.
Note: Incompleteness is the extent of failure to complete the race/colour field.Mental disorders I are mental and behavioural disorders due to use of other substances.Mental disorders II are mental and behavioural disorders due to use of alcohol.Source: SIM/SIH/SINAN -Departamento de Informática do SUS (DATASUS).