Acessibilidade / Reportar erro

Imputation method to reduce undetected severe acute respiratory infection cases during the coronavirus disease outbreak in Brazil

Abstract

INTRODUCTION:

The coronavirus disease (COVD-19) outbreak has overburdened the surveillance of severe acute respiratory infections (SARIs), including the laboratory network. This study was aimed at correcting the absence of laboratory results of reported SARI deaths.

METHODS:

The imputation method was applied for SARI deaths without laboratory information using clinico-epidemiological characteristics.

RESULTS:

Of 84,449 SARI deaths, 51% were confirmed with COVID-19 while 3% with other viral respiratory diseases. After the imputation method, 95% of deaths were reclassified as COVID-19 while 5% as other viral respiratory diseases.

CONCLUSIONS:

The imputation method was a useful and robust solution (sensitivity and positive predictive value of 98%) for missing values through clinical & epidemiological characteristics.

Keywords:
COVID-19; SARI; Laboratory test; Signs and symptoms; Imputation method

The coronavirus disease (COVID-19) pandemic had caused more than 10 million cases and 500,000 deaths worldwide by June 202011. World Health Organization (WHO). Coronavirus disease (COVID-19) situation reports. Coronavirus disease (COVID-19) situation reports. :18. 2020.. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has been spreading fast globally, causing many severe cases and deaths. This virus has a higher basic reproduction number (R0) and case fatality rate (CFR) compared to influenza (R0: 2.5-3.3 and CFR: 0.4%-2.9% versus R0: 1.2-2.3 e CFR: 0.15%-0.25%, respectively)22. Izadi N, Taherpour N, Mokhayeri Y, Sotoodeh Ghorbani S, Rahmani K, Hashemi Nazari SS. The epidemiologic parameters for COVID-19: A Systematic Review and Meta-Analysis [Internet]. Epidemiology; 2020 maio [cited 21 de julho de 2020]. Available at: Available at: http://medrxiv.org/lookup/doi/10.1101/2020.05.02.20088385 .
http://medrxiv.org/lookup/doi/10.1101/20...

3. Alimohamadi Y, Taghdir M, Sepandi M. Estimate of the Basic Reproduction Number for COVID-19: A Systematic Review and Meta-analysis. J Prev Med Pub Health. 2020;53(3):151-7.

4. Girard MP, Tam JS, Assossou OM, Kieny MP. The 2009 A (H1N1) influenza virus pandemic: A review. Vaccine. 2010;28(31):4895-902.
-55. Boëlle P-Y, Ansart S, Cori A, Valleron A-J. Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review: Transmission of A/H1N1 (2009) flu pandemic. Influenza Other Respir Viruses. 2011;5(5):306-16.. In Brazil, the first confirmed case was reported on February 25 in Sao Paulo City, and recently at least one case has been reported in all Brazilian states and almost all municipalities (96%)66. Souza WM de, Buss LF, da Silva Candido D, Carrera JP, Li S, Zarebski A, et al. Epidemiological and clinical characteristics of the early phase of the COVID-19 epidemic in Brazil [Internet]. Epidemiology; 2020 abr [cited 21 de julho de 2020]. Available at: Available at: http://medrxiv.org/lookup/doi/10.1101/2020.04.25.20077396 .
http://medrxiv.org/lookup/doi/10.1101/20...
.

Brazil has a surveillance system working at three levels (federal, state, and municipality) of government installed in public and private health units for severe acute respiratory illness (SARI), and notification of SARI has been mandatory since 2009. The reported cases included patients hospitalized because of SARI at any health service and mild respiratory cases reported by sentinel networks using an online database (Influenza Epidemiological Surveillance Information System in Brazil - SIVEP-GRIPE). The discovery of SARS-CoV-2 in China and suspected cases in Brazil were reported using the REDCap platform, remaining until the country reached 1,000 confirmed cases; subsequently, a new system was developed (e-SUS) and used to report mild respiratory cases, and the SARI remained reported on SIVEP. Because of the continuity and consistency, SIVEP has been maintained as an official system to report and monitor the severe cases of COVID-19, including the deaths from COVID-19 independent of hospitalization.

Although SIVEP is an online platform, inconsistencies in monitoring and case closure opportunities persist. In addition, the Ministry of Health has reported a high percentage of deaths from SARI without a diagnosis, called “non-specified SARI,” or alerted health authorities to a possible activity of other respiratory viruses in the Brazilian population. Therefore, this study was aimed at investigating the clinico-epidemiological characteristics of deaths from SARI reported in the Influenza Epidemiological Surveillance Information System in Brazil (SIVEP-Gripe) to correct the absence of robust laboratory results for COVID-19.

We used deaths from SARI reported in the Influenza Epidemiological Surveillance Information System in Brazil (SIVEP-Gripe) during the COVID-19 outbreak from January 1 to June 28, 2020. The death registers were selected using the case evolution variable.

All reported cases were classified as follows: (i) COVID-19, with laboratory confirmation through the reverse-transcriptase polymerase chain reaction (RT-PCR) for SARS-CoV-2; (ii) undetected, with laboratory confirmation through RT-PCR for other viruses; and (iii) missing value, with no confirmation through RT-PCR and an indeterminate result in the processing test. This was considered our response variable to the regression model and subsequently imputed.

Before completing the data imputation method, we performed the logistic regression analysis to identify the variables related to the response. First, we applied the univariate model using the following predictors: signs and symptoms (fever, cough, throat pain, dyspnea, respiratory distress, O2 saturation < 95%, diarrhea, and vomiting), comorbidities (chronic cardiovascular disease, chronic hematological disease, chronic liver disease, asthma, diabetes mellitus, chronic neurological disease, other chronic pneumopathy, immunodeficiency/immunodepression, chronic kidney disease, and obesity), hospitalization (yes/no), intensive care unit stay (yes/no), ventilation support (invasive, non-invasive, and none), chest X-ray, sex, and age group (<10 years, 10 to 39 years, 40 to 59 years, 60 to 69 years, and 70 years or more). The multiple logistic regression model was obtained from variables with a p-value less than 10% in the univariate regression model, and stepwise method was applied using the Akaike information criterion, Bayesian information criterion, and deviance. Subsequently, cases classified as “missing value” were subjected to a data imputation method using as predictors the variables selected in the multiple logistic regression.

We applied the multiple imputation method to obtain complete information for the “missing value” cases for the classification of SARI deaths. Imputation was performed using the additive regression method, which comprised procedures of a flexible additive model (nonparametric regression method) fitted on samples taken with replacements from original data and missing values (dependent variable) and predicted using non-missing values (independent variable obtained by multiple logistic regression)77. Van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76(12):1049-64.

8. Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. 2014;14(1):75.
-99. Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM. Review: A gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087-91..

We selected a random sample of SARI deaths that had resulted from COVID-19 and other viral respiratory diseases to validate the data imputation method. It generated randomly missing values for 30% of cases, and we applied the imputation method. Subsequently, the imputed values were compared with the observed values. The sensitivity, specificity, positive predictive value, and negative predictive value were calculated to quantify this validation. Furthermore, the Kappa test was performed to measure the concordance between the imputed and observed values. The significance level was considered as 5% for all analyses. All data were processed using R software, and the data imputation method was performed using the R package Hmisc.

In Brazil, from January 1 to July 28, 2020, 84,449 deaths from SARI were reported. Furthermore, 45,321 (54%) cases were confirmed using RT-PCR for some respiratory viruses, of which 42,981 (95%) were confirmed as COVID-19. These proportions of confirmed COVID-19 cases were different across Brazilian states, with the lowest in the Mato Grosso do Sul (19%) state and the highest in Acre (91%) (Table 1).

Considering the overall deaths reported in Brazil, the number of cases undetected for respiratory viruses, indeterminate in RT-PCR, not tested, in processing, and without information were 21,770 (26%), 553 (1%), 2,829 (3%), 6,404 (8%), and 7,571 (9%), respectively; all of these cases were considered as “missing value,” totaling 39,128 (46%) registers. Important variations were also observed across Brazilian states, highlighting the following five states with the highest proportions of missing value: Minas Gerais (76%), Mato Grosso do Sul (75%), Rio Grande do Sul (73%), Paraná (73%), and Santa Catarina (69%) (Table 1).

TABLE 1:
Death from severe acute respiratory illness classified by laboratory results reported by Brazilian states (Brazil, January to July, 2020)

In the univariate logistic regression model, the age group was associated with COVID-19 and positively correlated with the odds ratio. The signs and symptoms that showed significant associations were respiratory distress, fever, cough, throat pain, and dyspnea, all indicating inverse odds to be detected for COVID-19. Only four underlying health conditions presented with significant associations with COVID-19: chronic cardiovascular disease, diabetes mellitus, chronic kidney disease, and obesity. Individuals that needed intensive care were more likely to be detected with COVID-19. In the multiple logistic regression, only five variables remained in the final model: age group, with age of 40 years or above having approximately eight times more odds to be detected with COVID-19 compared to age below 10 years; 33% chance for individuals with respiratory distress; 10% to 20% more chance for individuals with chronic cardiovascular disease and diabetes mellitus, respectively; and increasing chance in individuals who require ventilation support (32%: invasive; 38%: non-invasive) (Table 2).

TABLE 2:
Demographic information and logistic regression for death from severe acute respiratory illness confirmed to be coronavirus disease in the reverse-transcriptase polymerase chain reaction test (Brazil, January to July, 2020).

Using the variables defined by the multiple logistic regression, the imputation method was applied for all data classified as “missing value.” Of the total registers classified as “missing value,” the data imputation method could classify 37,980 cases (97%). Furthermore, 1,994 (2%) cases were detected with other viral respiratory diseases (undetected for COVID-19), and 35,986 (43%) cases were confirmed with COVID-19. Therefore, of the total deaths from SARI that occurred in Brazil from January 1 to July 28, 2020, 95% were reclassified as COVID-19 while 5% as some other viral respiratory disease (not COVID-19). Hence, all Brazilian states and federal district have at least 90% of deaths from SARI classified as COVID-19. Only the Maranhão (15%), Mato Grosso (14%), and Mato Grosso do Sul (11%) states presented with more than 10% of SARI deaths classified as other viral respiratory diseases by the imputation data method (Table 3).

TABLE 3:
Imputed classification of death from severe acute respiratory illness by Brazilian states (Brazil, January to July, 2020).

To validate the data imputation method, simulation showed high sensitivity (99%) and positive predictive value (99%) and substantial values of specificity (71%) and negative predictive value (73%). Moreover, the Kappa test showed substantial concordance between the imputation method and the observed SARI reported (K= 0.71; p-value < 0.001) (Supplementary Table 1).

The absence of information in the test causing undetected cases of viral respiratory diseases is a bias in the information system and understanding the spreading of COVID-19 or other viral respiratory diseases in Brazil because only information about detected tests is reported in SIVEP-Gripe. Almost half of SARI deaths have an unknown cause; 26% of SARI deaths had undetected RT-PCR results; however, we do not know which respiratory viruses were tested, and more than 20% of deaths were not tested.

The simulation of the data imputation method from the real values proved a useful and robust solution to resolve the problem of the missing values or undetected results without identifying which respiratory viruses were tested using the clinical & epidemiological variables. This method presented a high sensitivity and positive predictive value and substantial values of specificity and negative predictive value, such as a moderate concordance with the real value using the simulation. Another way to validate this method is selecting some imputed cases and trying to investigate the medical records to identify more examinations (X-ray, tomography, etc.) that help confirm the cases and perform retesting for these cases using a different methodology suitable for laboratory collection. These estimations should be confirmed with empirical data as the quality of the information systems improve.

The main limitation of this method is the associated data structure, i.e., if the quality of information is not reasonably good, the output of imputation follows this bias. With the speed of disease spread in the country, surveillance may compromise the quality of filling out epidemiological antecedents. This can explain the difference observed in some states that showed less than 90% of detected COVID-19 cases. These states usually have worse filling of the investigation form (Maranhão missing value for variables ranging from 12% to 74% while Mato Grosso and Mato Grosso do Sul ranging from 3% to 67%).

ACKNOWLEDGMENTS

We thank the Department of Immunizations and Transmissible Diseases of the Brazilian Ministry of Health for supporting this work to provide the database available for free. Acknowledgments follow the references and notes but are not numbered.

REFERENCES

  • 1
    World Health Organization (WHO). Coronavirus disease (COVID-19) situation reports. Coronavirus disease (COVID-19) situation reports. :18. 2020.
  • 2
    Izadi N, Taherpour N, Mokhayeri Y, Sotoodeh Ghorbani S, Rahmani K, Hashemi Nazari SS. The epidemiologic parameters for COVID-19: A Systematic Review and Meta-Analysis [Internet]. Epidemiology; 2020 maio [cited 21 de julho de 2020]. Available at: Available at: http://medrxiv.org/lookup/doi/10.1101/2020.05.02.20088385
    » http://medrxiv.org/lookup/doi/10.1101/2020.05.02.20088385
  • 3
    Alimohamadi Y, Taghdir M, Sepandi M. Estimate of the Basic Reproduction Number for COVID-19: A Systematic Review and Meta-analysis. J Prev Med Pub Health. 2020;53(3):151-7.
  • 4
    Girard MP, Tam JS, Assossou OM, Kieny MP. The 2009 A (H1N1) influenza virus pandemic: A review. Vaccine. 2010;28(31):4895-902.
  • 5
    Boëlle P-Y, Ansart S, Cori A, Valleron A-J. Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review: Transmission of A/H1N1 (2009) flu pandemic. Influenza Other Respir Viruses. 2011;5(5):306-16.
  • 6
    Souza WM de, Buss LF, da Silva Candido D, Carrera JP, Li S, Zarebski A, et al. Epidemiological and clinical characteristics of the early phase of the COVID-19 epidemic in Brazil [Internet]. Epidemiology; 2020 abr [cited 21 de julho de 2020]. Available at: Available at: http://medrxiv.org/lookup/doi/10.1101/2020.04.25.20077396
    » http://medrxiv.org/lookup/doi/10.1101/2020.04.25.20077396
  • 7
    Van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76(12):1049-64.
  • 8
    Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. 2014;14(1):75.
  • 9
    Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM. Review: A gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087-91.
  • Financial Support: JC was granted for research notice “quick answer to COVID-19” from the Oswaldo Cruz Foundation, process/contract identification: 48111668950485.

Publication Dates

  • Publication in this collection
    14 Sept 2020
  • Date of issue
    2020

History

  • Received
    06 Aug 2020
  • Accepted
    17 Aug 2020
Sociedade Brasileira de Medicina Tropical - SBMT Caixa Postal 118, 38001-970 Uberaba MG Brazil, Tel.: +55 34 3318-5255 / +55 34 3318-5636/ +55 34 3318-5287, http://rsbmt.org.br/ - Uberaba - MG - Brazil
E-mail: rsbmt@uftm.edu.br