Analysis of the quality of prenatal data of pregnant women attended at Healthcare Services in the city of São Paulo between 2012 and 2020

ABSTRACT Objective: To analyze the quality of data collected during prenatal care recorded in the Integrated Health Care Management System (SIGA) of the Municipal Department of Health of São Paulo from 2012 to 2020. Methods: Descriptive study using SIGA data and the variables: maternal height (cm), weight (kg) measured throughout pregnancy, gestational age at prenatal consultation, systolic (SBP) and diastolic (DBP) blood pressure (in mmHg), and body mass index (BMI) at the beginning of pregnancy (up to 8 weeks). Quality analysis was carried out by calculating the indicators: percentage of incompleteness and zero values of all variables studied, percentage of implausible values for height, weight, BMI; preference for terminal digit of weight and height, and normality of distributions. Results: The database of pregnant women made available for analysis included 8,046,608 records and 1,174,115 women. The percentage of incompleteness and zero values was low (<1%) in all original variables of the system. There are more records at the end of pregnancy. For the four original variables of interest in the database (weight, height, SBP, DBP), there is a clear preference for the terminal digit. The variables of interest did not present an approximately normal distribution during the evaluated period. Conclusion: The quality analysis showed the need for improving the standardization of information collection and recording, the rounding of measurements and the need for encouraging pregnant women to start prenatal care as soon as possible, in such a way that it is important to invest in data quality, through educational resources for professionals who work in health care.


INTRODUCTION
Recording administrative data collected in prenatal care and childbirth can generate a large volume of information about the provided services.These data, when adequately collected, recorded and processed, can support managers in decision-making related to healthcare planning and surveillance 1,2 .
The exploration of secondary databases for administrative use, mainly from health information systems, can also substantially contribute to the knowledge of local needs.However, the use of these databases must be preceded by understanding their limits, such as the lack of standardization for collection and errors resulting from the lack of consistency in data entry into the system.Furthermore, it is necessary to know the population coverage to define its representativeness.There are problems in both the quality and type of information obtained; therefore, assessing the completeness and quality of these data is necessary 3 .
In the municipality of São Paulo, a system for scheduling appointments, controlling the waiting list and the number of appointments per unit during prenatal care was created in 2004.The Integrated Health Care Management System (Sistema Integrado de Gestão da Assistência à Saúde -SIGA) assists managers in organizing the referral and counter-referral system of healthcare services and hospitals that will perform deliveries.In addition, it enables to characterize pregnant women seen in the municipality; however, its use has only been administrative in nature, and there has been little or no use of the data for an epidemiological diagnosis.
Statistical analysis of datasets often encounters challenges such as incompleteness, zero values, and implausibility.Incompleteness refers to the absence of data in certain observations, which may compromise the integrity of the analyses.Zero values can indicate different scenarios, from the actual absence of occurrences to reporting issues.Implausibility addresses values that are outside the possible range or that do not make sense in the context of the study.To evaluate the dissimilarity between data sets, the index of dissimilarity is used, which quantifies the differences between data patterns 4 .Studies [5][6][7][8] show that inconsistencies found in databases can harm the quality of the information made available and the comprehensive assessment of individuals.
SIGA was implemented in other municipalities in Brazil; nevertheless, to date, there are no quality analyses of the data generated in the municipality of São Paulo.Therefore, this study aimed to analyze the quality of data collected from prenatal care and recorded in the SIGA of the Municipal Department of Health of the state of São Paulo, Brazil, between 2012 and 2020.

METHODS
This is a descriptive study using data from pregnant women registered in SIGA between 2012 and 2020.
The data are collected during routine prenatal care and entered into the system by professionals from the Healthcare Services (Unidades Básicas de Saúde -UBS).Overall, administrative employees type and collect information about user's identification and appointment scheduling, and technicians (physicians, nurses, nutritionists) collect and type other information regarding the provided care, for example, weight, height, blood pressure, among others.
The resident population of the city of São Paulo, in 2020, was estimated by the SEADE Foundation (Sistema Estadual de Análise de Dados -State System for Data Analysis) in 11,869,660 inhabitants 9 .In the city's Brazilian Unified Health System (SUS), the units responsible for primary health care totaled 469 10 in 2020, with 85% UBS or Family Healthcare Services and others, Healthcare Services/Outpatient Medical Assistance (UBS/AMA).All of these units feed SIGA with data on the provision of care in primary health care and specialized outpatient clinics.In the present study, data from one of the system's modules, namely SIGA/Mãe paulistana [Mother from São Paulo], was analyzed, in which data from pregnant women who undergo prenatal care at UBS are entered.
Among the SIGA variables, the following were considered for this analysis: maternal height (cm), weight (kg) measured throughout pregnancy, gestational age (in weeks) at the prenatal consultation, systolic blood pressure (SBP) and diastolic blood pressure (DBP), in mmHg.Two derived variables were created from these: the weight measured at the beginning of pregnancy (up to 8 weeks), as this is a variable that allows the calculation of gestational weight gain, and the body mass index (BMI) at the beginning of pregnancy (up to 8 weeks), calculated by dividing the weight measured up to 8 weeks by the maternal height (in meters) squared.BMI is used to diagnose maternal nutritional status at the beginning of pregnancy 11 .
The analysis of the quality of the database was carried out by calculating the following indicators: percentage of incompleteness and zero values, percentage of implausible values, preference for the terminal digit, and normality of distributions.These indicators were adapted from indicators used to assess the quality of anthropometric data on children 12 .
Incompleteness was defined as the percentage of information not filled in or zero values 13 .The degree of incompleteness was defined according to the cutoff points proposed by Romero and Cunha 14 : incompleteness below 5% was considered excellent; from 5 to 9.9%, good; from 10 to 19.9%, regular; from 20 to 49.9%, poor; and 50% or more, very poor.
Implausible values were defined differently for each variable of interest, and, whenever possible, external references were used.For maternal age, values below 10 and above 55 years were considered implausible; this criterion was used considering the age profile of the mother of live births in Brazil 15 . https://doi.org/10.1590/1980-549720230051 For height and BMI, values below -5 Z-score and above 5 Z-score were considered implausible, and for weight, values below -6 Z-score and above 6 Z-score 16 .The World Health Organization (WHO) 17 curves were used to define the cutoff points.
For the purpose of a cutoff point for the distribution of scores and greater similarity between the values of adults and adolescents, the same curve was standardized, considering the age of 19 years for all adult pregnant women, as there is no distribution curve for implausible values for those over 19 years of age.
For SBP and DBP, Z-scores outside the range of -6 to 6 of the sampling distribution for each trimester of pregnancy were considered implausible values.
The preference for terminal digits was evaluated using graphs and by calculating the index of dissimilarity, obtained using the formula (Equation 1): The index of dissimilarity can be interpreted as the percentage of values that would need to be redistributed so that the distribution of the final digits of the variable of interest is uniform (that is, without rounding).Values above 20% indicate preference for terminal digit and rounding to 0 or 5 12 .
The kurtosis coefficient indicates the thickness of the tails of the distribution compared to the normal distribution.A normal distribution has a kurtosis coefficient equal to zero.Positive values of kurtosis indicate that the tails of the distribution are shorter than those of the normal distribution, and negative values indicate longer tails 18 .All analyses were performed in the R software version 4.1.0 19.
This study is part of the research project entitled Como tornar as intervenções no parto e seus desfechos mais visíveis aos sistemas de informação?["How to make childbirth interventions and their outcomes more visible to information systems?"] (project with funding already approved in the Call for Data Science for Maternal and Child Health CNPq/ Bill & Melinda Gates Foundation/2020/2022), approved by the Research Ethics Committee of the Municipal Department of Health of São Paulo, under number 4.829.5.

RESULTS
The SIGA database of pregnant women made available for analysis included 8,046,608 records and 1,174,115 women from 2012 to 2020.The number of records varied by year of follow-up, with an increase in the number of consultations registered in the system from 2015 onward and a small reduction in 2018.This number remained stable between the months and years of 2018 to 2020 (Supplementary Figure 1).
The percentage of incompleteness and zero values was low (<1%) in all original variables of the system, deemed as excellent 14 .Higher incompleteness values were observed for the variables weight and BMI at the beginning of pregnancy (75.69%) in 2012, considered very poor 14 .We observed no variations in these percentages over the years.For the variables weight and BMI at the beginning of pregnancy, the percentage of incompleteness was high throughout the period, exceeding 60% by 2017.A tendency to reduce the percentage of incompleteness in these two variables over time was observed (for both variables, 75.69% in 2012 versus 49.93% in 2020).All registered women presented at least one weight and one height record.The percentage of implausible Z-scores was also low across all variables of interest (<0.5%), with relative stability over time (Table 1 and Supplementary Figure 2).
Regarding the number of records (consultations) according to gestational age, we observed that the largest volume occurs at the end of pregnancy, especially in the last trimester (from 27 weeks onward) (Supplementary Figure 3).This pattern is repeated for all evaluated years, and 3,885,165 (48.28%) records occurred between 27 and 40 weeks of gestation, in the period between 2012 and 2020.
For weight, height, SBP and DBP, there is a clear preference for terminal digit, that is, the measured values are clearly rounded to 0 or 5 (Figure 1).This rounding pattern is confirmed by the values observed for the index of dissimilarity, which is greater than 20% for weight, SBP and DBP, in all evaluated years (Table 2).For maternal height, despite the frequency of values 0 and 5 being higher than the frequency observed for other terminal digits (Figure 1), the values for the index of dissimilarity remained close to 10% in the period (Table 2).
The variables of interest did not present an approximately normal distribution during the evaluated period.Overall, the variables presented standard deviation values close to 1, with the lowest value observed for DBP in 2014 (0.752) and the highest for BMI at the beginning of pregnancy, in 2020 (1.223).SBP and DBP showed left asymmetry (negative) throughout the period, except for SBP in 2012.The other variables presented an asymmetry coefficient close to zero.All variables presented a kurtosis coefficient above 0; the lowest value was 2.95 (BMI at the beginning of pregnancy) and the highest was 17.85 (DBP), which indicates leptokurtic distribution (Table 3).

DISCUSSION
Studies on data quality in developing countries are limited 20 .This study was the pioneer in qualifying the data collected during prenatal care and recorded in SIGA.There is no consensus regarding data quality assessment criteria 21 .In this study, we used percentage of incompleteness and zero values, percentage of implausible values, preference for terminal digit, and normality of distributions.For the percentage of incompleteness, values classified as very poor were found in the variables initial weight and initial BMI throughout the analyzed period.High incompleteness values were also found in the study by Romero and Cunha 14 , who evaluated the quality of socioeconomic and demographic information, by Federative Unit of the Brazilian Mortality Information System (Sistema de Informações sobre Mortalidade -SIM).However, the percentage of incompleteness and zero values was low in all of the system's original variables, considering, in this regard, good quality data.A database must be complete and reliable regarding its records 22 .When these data are inconsistent, the reliability of the information is compromised and false diagnoses about the health situation can be established 23 .
Regarding the preference for terminal digit, in this study, there is a preference for the terminal digits 0 or 5 in the variables weight, height, SBP and DBP, similar to the National Survey of Food and Child Nutrition (Estudo Nacional de Alimentação e Nutrição Infantil -ENANI)  of 2019 16 , which found a preference for terminal digit for weight and height.We found low percentages of implausible Z-score values in all studied variables (<0.5%), as can be seen in the Supplementary Figure 2, remaining within the internationally recommended range (1%) 12 .
According to the Brazilian Ministry of Health, pregnant women must have at least six prenatal consultations, preferably one in the first trimester, two in the second trimester, and three in the third trimester of pregnancy 24 .In this study, we show an increase in the number of prenatal consultation records over the period, but it is not possible to know whether the number of consultations actually increased or whether more records were made in the system.The low percentage of consultation records at the beginning of pregnancy is similar to the findings of Domingues et al. 25 and Kac et al. 11 , who show a high proportion of Brazilian pregnant women starting prenatal care after 12 weeks.
Weight measurement must be carried out at all prenatal consultations, and height must be measured at least during the first consultation of pregnant women 26 .In the SIGA data, we can observe that all women have at least one weight and one height record during pregnancy.However, when evaluating weight information collected at the beginning of pregnancy (up to 8 weeks), the percentage of incompleteness is high throughout the evaluated period.The absence of weight recording at the beginning of pregnancy is reflected in the BMI during the period, and represents an important limitation for characterizing the nutritional status of these individuals and using indicators recommended by the Ministry of Health such as weight gain during pregnancy.In this case, the use of weight at the time of pregnancy diagnosis could be recommended, as long as pregnant women were instructed to record their weight at that time.
Pre-pregnancy BMI and weight gain during pregnancy are related to fetal and neonatal development as well as obstetric outcomes 27 .Taking this into consideration, the importance of collecting quality information during the prenatal period is highlighted, contributing to more effective monitoring and obtaining more accurate and valuable future indicators, aiming to improve public policies with specific nutritional guidance for this group, improving pregnant women's access to less processed foods, with lower fat, sodium and sugar content, as well as greater amounts of vitamins and minerals 28,29 .
We verified some limitations of the system (SIGA), such as the gestational age being in weeks and not in days, which reduces the precision of this datum, also as a result of non-standard rounding 30 .Another limitation of the system was the lack of standardization of collection.In the municipality of São Paulo, the process of collecting prenatal data, as well as other data, takes place in more than 469 10 units, and these data are entered by professionals in charge in each unit.The in-service training of these professionals does not always meet their needs, sometimes resulting in inconsistent, non-standardized, and incomplete data 31 .In the case of the weight and height of pregnant women, both the measurement techniques and the recording may be incorrect, hence the importance of providing instruments to the involved professionals and returning the data for analysis to the units responsible for collecting and typing them.
This study is the first to use SIGA data, which makes it unprecedented and extremely important for managers' knowledge of data from maternity wards and UBS in their territory.Despite this important positive aspect, the present study have some limitations, such as the use of quality indicators adapted from those used for children, as there are no specific indicators for pregnant women.
The analyses carried out in this study enabled to show the need for an improvement in the standardization of the collection and recording of information, for example, stipulating clear rules for rounding measurements for greater precision.Another important aspect is the need to encourage pregnant women to start prenatal care as soon as possible, be properly informed and actively participate in monitoring their personal health indicators (such as weight, blood glucose and blood pressure) as well as monitoring and improving the quality of these data in service records.
Improving the SIGA database data would result in an improvement in data quality, enabling to expand its use for calculating indicators and to monitor public policies aimed at pregnant women.In this sense, it is extremely important to invest in the quality of data, through educational resources for those responsible for filling in the data 32 .The regular dissemination and use of SIGA prenatal information should be encouraged, as it is valuable for epidemiological analyses and can contribute to improving its quality 1 .Moreover, we highlight the importance of evaluating the quality of the SIGA database, as it can be linked, for example, with the live birth database, SINASC (Sistema de Informação sobre Nascidos Vivos -Brazilian Live Birth Information System), to analyze outcomes in relation to the baby, the mother, and the delivery condition.

Table 1 .
Absolute and relative frequency of incompleteness values, zero values and implausible Z-scores in the variables of interest in the pregnant women module of the Integrated Health Care Management System of the Municipal Department of Health of São Paulo, municipality of São Paulo, 2012 to 2020.

Figure 1 .
Figure 1.Identification of preference for terminal digit in anthropometric variables: (A) weight during pregnancy (kg); (B) maternal height (m); (C) systolic blood pressure (mmHg); and (D) diastolic blood pressure (mmHg) from the pregnant women module of the Integrated Health Care Management System of the Municipal Department of Health of São Paulo, 2012 to 2020.