Delay in death reporting affects timely monitoring and modeling of the COVID-19 pandemic

This study describes the COVID-19 death reporting delay in the city of São Luís, Maranhão State, Brazil, and shows its impact on timely monitoring and modeling of the COVID-19 pandemic, while seeking to ascertain how nowcasting can improve death reporting delay. We analyzed COVID-19 death data reported daily in the Epidemiological Bulletin of the State Health Secretariat of Maranhão and calculated the reporting delay from March 23 to August 29, 2020. A semi-mechanistic Bayesian hierarchical model was fitted to illustrate the impact of death reporting delay and test the effectiveness of a Bayesian Nowcasting in improving data quality. Only 17.8% of deaths were reported without delay or the day after, while 40.5% were reported more than 30 days late. Following an initial underestimation due to reporting delay, 644 deaths were reported from June 7 to August 29, although only 116 deaths occurred during this period. Using the Bayesian nowcasting technique partially improved the quality of mortality data during the peak of the pandemic, providing estimates that better matched the observed scenario in the city, becoming unusable nearly two months after the peak. As delay in death reporting can directly interfere with assertive and timely decision-making regarding the COVID-19 pandemic, the Brazilian epidemiological surveillance system must be urgently revised and notifying the date of death must be mandatory. Nowcasting has proven somewhat effective in improving the quality of mortality data, but only at the peak of the pandemic. COVID-19; SARS-CoV-2; Mortality Registries; Data Accuracy Correspondence M. A. G. Campos Programa de Pós-graduação em Saúde Coletiva, Universidade Federal do Maranhão. Rua Barão de Itapari 155, São Luís, MA 65020-070, Brasil. marcos.adrianogc@gmail.com 1 Universidade Federal do Maranhão, São Luís, Brasil. This article is published in Open Access under the Creative Commons Attribution license, which allows use, distribution, and reproduction in any medium, without restrictions, as long as the original work is correctly cited. Carvalho CA et al. 2 Cad. Saúde Pública 2021; 37(7):e00292320 Introduction Surveillance of mortality data is a key tool for public health, as it allows to monitor the dynamics and impact of health-related events 1, gaining greater relevance in contexts of pandemics and new diseases. But for mortality data to fulfill its role in helping the follow-up of an outbreak, the notification of deaths must be updated 2. Since the onset of COVID-19 transmission in Brazil, the country has shown a notorious difficulty in identifying new cases. Few tests were performed, with the case reporting rate being estimated at only 9.2% 2,3,4,5. A seroprevalence study conducted in São Luís, State of Maranhão, showed that only 3.4% of infections were reported 6. Another issue regarding data on the number of cases is the delay in case reporting 7. Modelling of infectious diseases has greater accuracy when done using case reporting based on date of symptom onset-data that is rarely available. Moreover, most case notifications were based on rapid antibody tests (lateral flow immunoassays) rather than molecular ones, with only 38% of cases being identified by RT-PCR (reverse transcription polymerase chain reaction) in Brazil. This reflects a low capacity to timely diagnose active cases of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) 8. High underreporting, unavailability of separate RT-PCR-based case reporting, and lack of data on date of symptom onset hinders COVID-19 modelling based on case reporting. By using cases in modelling, we are dealing with an outdated picture of the disease, looking at the transmission dynamics via the rear-view mirror. The use of mortality data attributed to COVID-19 tends, therefore, to be a more accurate indicator for monitoring the pandemic, since the underreporting of deaths is expected to be lower than that of cases 9,10. But mortality data is not without its problems. An analysis conducted by the COVID-19 BR Observatory estimated that 61% of deaths in Brazil took more than 10 days to be reported, only 3% were reported one day after occurring, and very rarely on the date of death (0.17%) 11. Death reporting delay in Brazil is therefore high and the notified data reflect an already outdated scenario. Some studies point to RT-PCR shortage, death attributed to other causes with similar clinical manifestations, and the occurrence of false negatives due to quality control problems in nasal swab collection for testing as possible causes for COVID-19 death underreporting 2. Due to these issues one must be careful and cautious when using reported data from information systems or epidemiological bulletins to analyze the pandemic scenario. One way to improve the quality of this information and allow a less biased use of the reported data is nowcasting. This approach seeks to estimate, at a given point in time, the number of events that have occurred but have not yet been reported 12. It generates a distribution of the reporting delay from observations where the occurrence and reporting date of the event of interest are known. Given this distribution and the number of events reported at a given time, one can infer the actual number of events that occurred. The result is a pandemic curve closer to the current state of the outbreak 13. Besides Brazil, delays in death reporting have also been described in high-income countries such as Sweden, Germany and the United States 13,14. Seeking ways to minimize delay in official data reporting is relevant in places where reporting information is of poor quality, to more reliably follow the pandemic dynamics and implement public health measures. Thus, this study describes the COVID-19 death reporting delay in the city of São Luís and shows its impact on timely monitoring and modelling the COVID-19 pandemic, while seeking to ascertain how nowcasting can improve death reporting delay.


Introduction
Surveillance of mortality data is a key tool for public health, as it allows to monitor the dynamics and impact of health-related events 1 , gaining greater relevance in contexts of pandemics and new diseases. But for mortality data to fulfill its role in helping the follow-up of an outbreak, the notification of deaths must be updated 2 .
Since the onset of COVID-19 transmission in Brazil, the country has shown a notorious difficulty in identifying new cases. Few tests were performed, with the case reporting rate being estimated at only 9.2% 2,3,4,5 . A seroprevalence study conducted in São Luís, State of Maranhão, showed that only 3.4% of infections were reported 6 . Another issue regarding data on the number of cases is the delay in case reporting 7 . Modelling of infectious diseases has greater accuracy when done using case reporting based on date of symptom onset-data that is rarely available. Moreover, most case notifications were based on rapid antibody tests (lateral flow immunoassays) rather than molecular ones, with only 38% of cases being identified by RT-PCR (reverse transcription polymerase chain reaction) in Brazil. This reflects a low capacity to timely diagnose active cases of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) 8 .
High underreporting, unavailability of separate RT-PCR-based case reporting, and lack of data on date of symptom onset hinders COVID-19 modelling based on case reporting. By using cases in modelling, we are dealing with an outdated picture of the disease, looking at the transmission dynamics via the rear-view mirror. The use of mortality data attributed to COVID-19 tends, therefore, to be a more accurate indicator for monitoring the pandemic, since the underreporting of deaths is expected to be lower than that of cases 9,10 .
But mortality data is not without its problems. An analysis conducted by the COVID-19 BR Observatory estimated that 61% of deaths in Brazil took more than 10 days to be reported, only 3% were reported one day after occurring, and very rarely on the date of death (0.17%) 11 . Death reporting delay in Brazil is therefore high and the notified data reflect an already outdated scenario. Some studies point to RT-PCR shortage, death attributed to other causes with similar clinical manifestations, and the occurrence of false negatives due to quality control problems in nasal swab collection for testing as possible causes for COVID-19 death underreporting 2 . Due to these issues one must be careful and cautious when using reported data from information systems or epidemiological bulletins to analyze the pandemic scenario.
One way to improve the quality of this information and allow a less biased use of the reported data is nowcasting. This approach seeks to estimate, at a given point in time, the number of events that have occurred but have not yet been reported 12 . It generates a distribution of the reporting delay from observations where the occurrence and reporting date of the event of interest are known. Given this distribution and the number of events reported at a given time, one can infer the actual number of events that occurred. The result is a pandemic curve closer to the current state of the outbreak 13 .
Besides Brazil, delays in death reporting have also been described in high-income countries such as Sweden, Germany and the United States 13,14 . Seeking ways to minimize delay in official data reporting is relevant in places where reporting information is of poor quality, to more reliably follow the pandemic dynamics and implement public health measures. Thus, this study describes the COVID-19 death reporting delay in the city of São Luís and shows its impact on timely monitoring and modelling the COVID-19 pandemic, while seeking to ascertain how nowcasting can improve death reporting delay.

Methods
This is a descriptive study conducted using data on COVID- 19 15 .
Death reporting delay was calculated by the difference in days between occurrence date and reporting date, from March 23 to August 29, 2020 (when the death reporting delay was significatively reduced). This variable was presented as median and interquartile range (IQ) and in boxplots according to the Epidemiological Week of reporting. We outlined a graph to show the weekly number of deaths by date of reporting and of occurrence, and the nowcasting estimates by date of occurrence.
To adjust for incomplete notification data in recent weeks, we considered McGough et al.'s approach 12 . Implementation was done by R package NobBS (https://www.r-project.org/), using a negative binomial model for death reporting delay, with both adaptation phase and burn-in of 5000 iterations, and the software's default values for the remaining parameters.
To illustrate the impact of death reporting delay and to test the effectiveness of nowcasting in improving data quality, we fitted a semi-mechanistic Bayesian hierarchical model to death data by reporting date and to nowcasted death data by occurrence date. This model uses death data (not cases), which we consider to be less susceptible to underreporting and, therefore, are more appropriate for studying the magnitude and trend of the pandemic. It was originally proposed by Flaxman et al. 16 to describe the number of infections, number of deaths and time-varying effective reproduction number (R t ) of COVID-19 based on death counts. This Bayesian smoothing nowcasting method models the delay distribution based on the available data of onset date (occurrence date) and reporting date of cases or deaths. Such approach consists in modeling daily deaths Dt as a negative binomial distribution 16 : Where d t represents the expected number of deaths attributed to COVID-19 on day t and ψ, given by a half-normal distribution ψ ~N(0.5) 16 . The calculation of d t takes into account the number of new infections on day t, namely C t , and the probability of death π, given by a weighted sum: i.e., the expected number of deaths on day t is a sum of previous infections weighted by their probability of death 16 . By considering the temporal relation between the death and its reporting, the model relies on a more realistic estimation process; it was then adapted to incorporate Google mobility data, assuming that population mobility patterns are linked to transmission intensity, and used to analyze the pandemic in 16 Brazilian states 17 . Google mobility data were described by four covariates: mobility in residential areas, transit stations, parks and the average between grocery, pharmacy, retail and workplace areas. For this study, we used data for the city of São Luís publicly available at https://www. google.com/covid19/mobility/. We fitted this model in two moments: first, with data until May 09, 2020 and then with information until June 30, 2020. May 09, 2020 was chosen for being the date when deaths by COVID-19 in the city of São Luís reached their peak; June 30, 2020, in turn, corresponds to when the peak had already passed, but the number of death reports continued to increase. Each analysis compared estimates of the daily number of infections and deaths with the time-varying reproduction number (R t ), considering deaths by notification date, by occurrence date, by occurrence date correcting notification delay by nowcasting, and by occurrence date including deaths notified until December 24, 2020 -date when most deaths in the first half of 2020 had already been reported, without further delay. Besides São Luís, we also included data on the other 15 state capitals from the original report 17 to fulfill the partial-pooling of covariates coefficients of the model.
All analyses were performed using R version 4.0.218 (https://www.r-project.org/) and the graphics were plotted using Ggplot2 package 18 . Figure 1 presents the boxplots of COVID-19 death reporting delay in days by reporting week. We can observe delays from the first death report, on March 29. In the weeks from June 14 to August 29 deaths were reported with greater delay, reaching a median of 103 days between August 16 and 22 and a maximum value of 126 days late on August 25.

Results
Only 17.8% of deaths were reported without delay or the day after, while 40.5% were reported more than 30 days late, with a median of 14 days and IQ of 53 days. From April 19 to June 13, 2020, period with the highest number of deaths due to COVID-19 in São Luís, the median delay was 5 days (IQ = 10). After this period, the median delay increased to 57 days (IQ = 28), with 81.6% showing a delay greater than 30 days (Table 1). Figure 2 shows the weekly number of observed deaths by occurrence and reporting date. The notification curve is, in general, delayed when compared to the observed deaths by occurrence date. As a result, we have an initial underestimation of deaths, whose peak in the week from May 3rd to 9th, 2020 reached 214 deaths; while the peak of notifications, between May 10 and May 16, had 106 deaths reported. From June 7 to August 29, 2020, in turn, we observed a high reporting of deaths that occurred before this period: 644 deaths were reported, but only 116 deaths occurred in these weeks. Figure 3 presents the daily number of infections, deaths and time-varying reproduction number (R t ) until May 9, estimated by a semi-mechanistic Bayesian hierarchical model of COVID-19. Considering estimates adjusted for deaths by notification date, the daily number of infections and deaths showed an increasing patter and R t was above 1. By May 9, the model estimated 20 deaths per day (Figure 3a). Regarding estimates by occurrence date, the daily number of infections and deaths rose early in the pandemic and almost levelled off by May 9, with an estimation of 10 deaths per day and    the 50% credible interval for the R t below 1, suggesting a deceleration of the pandemic (Figure 3b). The estimates by occurrence date correcting death reporting delay by nowcasting, showed an increasing daily number of infections and deaths and R t above 1; by May 9, however, the model estimated nearly 30 deaths per day, 3 times higher than the estimate obtained without nowcasting ( Figure 3c). As for estimates by occurrence date including all deaths reported until December 2020, we found an increasing daily number of infections and deaths and Rt above 1. However, the model estimated 45 deaths per day by May 9, 50% higher than the number obtained by nowcasting and 4.5 times higher than the uncorrected estimates (Figure 3d).

Discussion
Our results showed a high delay in COVID-19 death reporting in the city of São Luís, which affected the city's ability to monitor the pandemic dynamics over time, first underestimating and more recently overestimating the actual number of deaths. We also observed that combining death data with reporting delay affected the results of the Bayesian model of COVID-19, changing its estimates and depicting a scenario incompatible with the reality seen in São Luís at the time. The use of Bayesian nowcasting technique to minimize the delay in death notification partially improved the quality of mortality data during the peak of the pandemic, presenting estimates that best match the scenario observed in the city, but became less useful almost 2 months after the peak.
The death notification peak was 102% lower, taking place one week after the actual peak of deaths in the city. Thus, the reporting delay caused the pandemic to be seen retroactively, appearing to be of lesser extent in the official records. Measures taken to combat COVID-19 could therefore have occurred out of step with the current epidemiological dynamics if only death reporting data were considered in decision-making.
From June 7 to July 18, 2020, the Epidemiological Weeks after the peak, deaths by reporting date showed an upward curve incompatible with the scenario observed in-loco, characterized by a reduced R t to values below 1 19     pancy of clinical and intensive care unit (ICU) beds 20 . This upward death curve resulted from death reporting delays, which had a median of 57 days between June 14 and August 29, 2020 and 81.6% of deaths notified more than 30 days before their occurrence. Death reporting delays therefore led to an overestimation of the number of deaths as of June 7, 2020.
Difficulty in obtaining properly updated data on mortality can hinder decision making, increasing the likelihood that disease control actions will not be implemented at the most opportune times. After court order, São Luís enforced the lockdown from May 5 to 17, 2020, motivated, among other factors, by the high and growing number of deaths by COVID-19 recorded in the city 21 . The analysis of deaths by occurrence date shows that the start of the lockdown coincided with the peak of deaths, registering a decline in the following week. As the estimated mean timeframe from infection to death is around 23 days 22,23 , one can conclude that the lockdown only had an impact on COVID-19 mortality after its peak, when we observed a decreasing trend. Moreover, the peak number of deaths that prompted the lockdown decree (April 30, 2020) had already been reached at least one week earlier. If the death surveillance system in São Luís had been thoroughly updated, then the decision to implement the lockdown would likely have been made earlier. Timely decision-making is crucial in a pandemic like COVID-19, with its very rapid transmission dynamics and effects that profoundly affect the health system and the economy.
Death reporting delay also negatively impacts the results of statistical models that use death numbers as a starting point to estimate the number of cases and the R t . Using the same Bayesian model implemented by the Imperial College of London to analyze the pandemic dynamics in 16 Brazilian states 17 , we estimated the daily number of cases, deaths and R t for the city of São Luís on May 9, 2020 during the pandemic peak. Models adjusted for deaths by reporting date correctly showed the upward pandemic trend, but underestimated its magnitude. Models adjusted for deaths by occurrence date, on the other hand, suggested a flatter increase, correctly predicted R t below 1 in early May, but grossly underestimated the magnitude of the pandemic. The nowcasting-corrected estimates, in turn, came close to the real situation, correctly predicted Rt but still underestimated the magnitude of daily infections and deaths.
We also estimated the daily number of cases, deaths and R t as of June 30, 2020, after the peak, but when death reporting was still on the rise. Models adjusted for deaths by reporting date proved unus-Cad. Saúde Pública 2021; 37(7):e00292320 able as they erroneously predicted that the pandemic was still on the rise when the worst was over. Almost 2 months after the peak, models adjusted for deaths by occurrence date, nowcasting-corrected or not or including deaths notified until December estimated more accurate daily number of infections and deaths and R t .
These results suggests benefits of using nowcasting to correct estimates of the daily number of cases, deaths and R t , thus allowing for better monitoring of the pandemic when faced with a huge delay in death reporting. But nowcasting still underestimated the magnitude of the pandemic. After the pandemic peak, nowcasting ceased to be advantageous, as over time most of the previously unreported deaths had already been included. After the peak, analyses based on deaths by occurrence date corresponded more closely to the pandemic dynamics in the city, which was showing a reduction in the number of new cases and a decrease in the occupancy of clinical and ICU hospital beds 20 .
Using death data unadjusted for reporting delay can result in estimates or interpretations that correspond to a reality of days or even weeks past, and is unable to fulfill its purpose of helping to predict pandemic trends over time, as shown in this study. Places with significant death reporting delay, are recommended to avoid using data on deaths by reporting date as a parameter for monitoring the pandemic. In this context, using mortality data by occurrence date or adjusted by nowcasting are better options for timely pandemic monitoring and decision-making. In cities with high quality mortality data and shorter reporting delay, we expect the impact of nowcasting to be smaller.
The median delay of death reporting increased over time, with the most recent Epidemiological Weeks showing a delay then times greater than the previous weeks. Most deaths reported from June 7 to August 29, 2020 corresponded to deaths that occurred in April and May 2020, when the city of São Luís experienced the peak of the disease and the number of suspected cases was very high 24 . Such scenario overwhelmed the public testing laboratories. We believe that during this time testing priority was given to suspected cases rather than deaths, which were mostly tested and reported after the peak. Another factor that may contribute to the observed delay in death reporting is the failure of reporting and logistics management. Increases in the number of deaths should always be analyzed considering the delay in notification, otherwise they may generate "false alarms" of a second wave.
A limitation of the present study is the possibility that underreporting of deaths due to COVID-19 may still be present, leading to underestimation of the Bayesian model results. A second limitation is that the Google mobility data is an estimate based exclusively on individuals who use this technology, so it may not be representative of the population. Moreover, demographic groups most affected by COVID-19, such as older adults, may be underrepresented in this data, as may more vulnerable socioeconomic groups 25 . It should be noted, however, that Google mobility data is the only publicly available mobility indicator for São Luís, and the use of a mobility measure is key to improve R t estimation 26 . Thus, we believe that the benefits of its use outweigh the bias due to possible underrepresentation.
On the other hand, a strong point of the study is the use of Bayesian nowcasting to minimize reporting delay and to track pandemic dynamics based on death data. Bayesian models are flexible enough to capture relevant data properties. We can incorporate information on underreporting of deaths and define the prior distribution. These data are combined with the information from the prior distribution to predict deaths.
This study observed that the delay in reporting deaths due to COVID-19 in São Luís was high and impacted the timely monitoring of pandemic dynamics. These problems can directly interfere with assertive and timely decision making, particularly in the face of rapidly spreading pandemics like COVID-19, with serious health, economic, and social repercussions. Our findings therefore point to institutional weaknesses in ensuring the quantity and quality in the recording of data necessary to describe the health reality. Moreover, these results alert to the need to review the epidemiological surveillance system in Maranhão and Brazil, as it is likely that a similar data quality issue exists in other state municipalities and in the country. In Brazil, the COVID-19 pandemic revealed flaws that point to years of underfunding and undervaluation of mortality and epidemiologic surveillance data.
We suggest that researchers and managers investigate the quality of mortality data available in their city or country, to check for reporting delays that may affect tracking the COVID-19 pandemic dynamics. In places where the quality of mortality data is low, we recommend that local epidemiological surveillance should seek to reduce or solve this issue by relegating the analysis of suspected deaths to the background. We also suggest disclosing detailed information, such as the number of suspected Cad. Saúde Pública 2021; 37 (7):e00292320 deaths and date of death. Finally, researchers and/or managers who intend to use death information in statistical models should check for reporting delay and account for it in their estimates. To this end, the nowcasting technique proved to be somewhat effective in improving the quality of mortality data, allowing the estimation of dynamic transmission parameters that best fit the epidemiological situation at the pandemic peak. Sometime after the peak, however, when the surveillance system was less overloaded, nowcasting hardly improved the monitoring of pandemic dynamics.
In conclusion, improved up-to-date reporting of deaths is mandatory for better monitoring of transmission dynamics. More investments to improve epidemiologic surveillance systems are urgently needed.