Feasibility of very short-term forecast models for COVID-19 hospital-based surveillance

Edson Zangiacomi Martinez Afonso Dinis Costa Passos Antônio Fernando Cinto Andreia Cássia Escarso Rosane Aparecida Monteiro Jorgete Maria e Silva Fernando Bellissimo-Rodrigues Davi Casale Aragon About the authors

Abstract

INTRODUCTION:

We evaluated the performance of Bayesian vector autoregressive (BVAR) and Holt’s models to forecast the weekly COVID-19 reported cases in six units of a large hospital.

METHODS:

Cases reported from epidemiologic weeks (EW) 12-37 were selected as the training period, and from EW 38-41 as the test period.

RESULTS:

The models performed well in forecasting cases within one or two weeks following the end of the time-series, but forecasts for a more distant period were inaccurate.

CONCLUSIONS:

Both models offered reasonable performance in very short-term forecasts for confirmed cases of COVID-19.

Keywords:
COVID-19; Coronavirus disease; Forecasting; Statistical models; Epidemiology

The coronavirus disease (COVID-19) pandemic has already claimed more than 190,000 lives in Brazil at the time of this writing. Among the many efforts made to increase knowledge about this new disease and enable strategies for its mitigation, mathematical and statistical models can provide useful information about the dynamics of COVID-19. These approaches include susceptible-infected-recovered (SIR) models and their extensions, exponential smoothing, and models based on S-shaped curves, such as the logistic, Gompertz, and Richards curves11. Martinez EZ, Aragon DC, Nunes AA. Long-term forecasts of the COVID-19 epidemic: a dangerous idea. Rev Soc Bras Med Trop 2020;53:e20200481. doi: 10.1590/0037-8682-0481-2020.
https://doi.org/10.1590/0037-8682-0481-2...
. These models are used to describe the temporal variations of an outbreak and can provide short-term forecasts of new cases of the disease. Furthermore, when the aim is to evaluate two or more simultaneous time-series of events, we can consider the use of multivariate approaches such as vector autoregressive (VAR) models22. Enders W. Applied Econometric Time Series. 3rd ed. New York: John Wiley & Sons; 2010.. VAR models are very useful, especially in the field of economics22. Enders W. Applied Econometric Time Series. 3rd ed. New York: John Wiley & Sons; 2010., but they also have applications in epidemiology33. Oliveira JF, Rodrigues MS, Skalinski LM, Santos AE, Costa LC, Cardim LL, et al. Interdependence between confirmed and discarded cases of dengue, chikungunya and Zika viruses in Brazil: A multivariate time-series analysis. PloS One 2020; 15(2):e0228347. doi: 10.1371/journal.pone.0228347.
https://doi.org/10.1371/journal.pone.022...
, including the modeling of COVID-19 data44. Khan F, Saeed A, Ali S. Modelling and forecasting of new cases, deaths and recover cases of COVID-19 by using Vector Autoregressive model in Pakistan. Chaos Solitons Fractals. 2020; 140:110189. doi: 10.1016/j.chaos.2020.110189.
https://doi.org/10.1016/j.chaos.2020.110...
.

In this study, we used a multivariate time-series analysis based on a Bayesian extension of the VAR model in order to describe the weekly number of confirmed cases of COVID-19 reported in six units of the Clinical Hospital of the Ribeirão Preto Medical School (HCFMRP), University of São Paulo (USP), and to explore the possibility of very short-term forecasts. Multivariate time-series data can provide more information than univariate time-series data. VAR models and their extensions allow us to investigate how each variable impacts the other in each separate time-series, and this can improve the forecasting process. We compared the forecast accuracy of this multivariate analysis with those obtained by exponential smoothing based on Holt's method.

We used data from the Center of Epidemiological and Hospital Surveillance (NVEH) of the HCFMRP. By convention, epidemiological weeks (EW) run from Sunday to Saturday, and we considered the weekly number of confirmed cases of COVID-19 reported from EW 12 (March 15-21, 2020) to EW 41 (October 4-10, 2020). The HCFMRP is a large hospital complex with many health departments and 1,194 general beds in total. There were 45,866 hospitalizations and 41,377 surgeries in 2019. The health units of the HCFMRP included in the present study were the central unit located on the USP campus (here referred as "Campus"), the State Hospital of Américo Brasiliense (HEAB), the State Hospital of Ribeirão Preto (HERP), the State Hospital of Serrana (HES), the Reference Center in Women's Health (MATER), and the Emergency Unity (UE). Four of these units are located in Ribeirão Preto, a medium-sized city in the northwest region of the state of São Paulo, Brazil, with approximately 700,000 inhabitants, and the other two units are located in the cities of Américo Brasiliense and Serrana, located near Ribeirão Preto. Patients suspected of infection were tested for COVID-19 by reverse transcription polymerase chain reaction (RT-PCR). This was a study conducted with anonymized secondary data and therefore did not require approval from the Human Research Ethics Committee.

Formally, a time-series is defined as a collection of random variables {Y t }, where the time index t assumes integer values t = 1, 2, 3, and so on. Let us consider a K-dimensional multivariate time-series denoted by Yt = (Y 1t, …, Y Kt ). The VAR model, in this matrix form, is given by Yt=B0+B1Yt-1+B2Yt-2++B2Yt-p+et, where p is the order of the autoregressive process (lag), B0 = (b 10, b 20, ..., b K0 ) is a K×1 column vector of coefficients, Bj are KxK matrices of coefficients (j = 1,...,p), and e t = (e 1(t) , e 2(t) , ..., e K(t) ) are the errors of the model22. Enders W. Applied Econometric Time Series. 3rd ed. New York: John Wiley & Sons; 2010.,33. Oliveira JF, Rodrigues MS, Skalinski LM, Santos AE, Costa LC, Cardim LL, et al. Interdependence between confirmed and discarded cases of dengue, chikungunya and Zika viruses in Brazil: A multivariate time-series analysis. PloS One 2020; 15(2):e0228347. doi: 10.1371/journal.pone.0228347.
https://doi.org/10.1371/journal.pone.022...
. It is often assumed that the errors follow a multivariate normal distribution with a vector mean of 0 and variance-covariance matrix S. An autoregressive model of order p was used, meaning that the p preceding values are used to predict the next value. In this study, we consider a multivariate set of K = 6 time series, each relative to a health unit of HCFMRP. The length of each series was 30 weeks, considering data until October 10, 2020. The estimation of a VAR model uses an equation-by-equation approach, where ordinary least squares (OLS) regressions were used for each equation. One of the prerequisites for the estimation of a VAR model is that the analyzed time-series are stationary, or that they show similar behavior throughout their duration. In this case, their mean and variance remained unchanged with time. Brooks55. Brooks C. Introductory Econometrics for Finance. 3rd ed. Cambridge: Cambridge University Press; 2014. argues that, “if one wishes to use hypothesis tests, either singly or jointly, to examine the statistical significance of the coefficients, then it is essential that all of the components in the VAR are stationary.” As we used the cumulative number of COVID-19 cases to conduct model fitting, variables are obviously non-stationary given the increasing behavior of the corresponding curves. As an alternative, we can use the Bayesian vector autoregressive (BVAR) model. BVAR models use Bayesian methods to estimate the parameters of the VAR model, and according to Holden66. Holden K. Vector auto regression modeling and forecasting. J Forecast. 1995; 14(3):159-66. doi:10.1002/for.3980140302.
https://doi.org/10.1002/for.3980140302...
, some authors have argued that this approach may be applied to non-stationary variables. For example, in a study discussing the asymptotic distribution theory for statistics from autoregressive models with a unit root, including VAR models, Sims et al.77. Sims CA, Stock JH, Watson MW. Inference in linear time series models with some unit roots. Econometrica 1990; 58(1):113-44. doi:10.2307/2938337.
https://doi.org/10.2307/2938337...
claim that “because the Bayesian approach is entirely based on the likelihood function, which has the same Gaussian shape regardless of the presence of nonstationarity, Bayesian inference need take no special account of nonstationarity.” Gupta and Kotzé88. Gupta R, Kotzé K. The role of oil prices in the forecasts of South African interest rates: A Bayesian approach. Energy Econ. 2017;61:270-8. doi: 10.1016/j.eneco.2016.11.017.
https://doi.org/10.1016/j.eneco.2016.11....
also provide some discussion on this.

The Akaike information criterion (AIC) values for different models with lag length (p) ranging from 1 to 6 are, respectively, given by 969.8, 898.6, 880.2, 761.5, 626.4, and 839.9. Models with p > 6 were not fitted, because high-order lags may overfit the data. Based on these AIC values, we decided to consider p = 5, because models with lower AIC values are preferred. Figure 1 describes the time-series for the cumulative number of cases reported in each health unit and the predicted values obtained from the BVAR model, based on p = 1 and p = 5. These models were fitted using the Metropolis-Hastings algorithm in the R package “BVAR”, based on 100,000 simulated samples taken after discarding an initial 5,000 burn-in period. Convergence of the simulated samples was verified by trace and density plots, and correlation between successive samples was inspected with autocorrelation plots. The BVAR package uses the Minnesota prior as baseline99. Carriero A, Clark TE, Marcellino M. Bayesian VARs: specification choices and forecast accuracy. J Appl Econometrics. 2015;30(1):46-73. doi: 10.1002/jae.2315.
https://doi.org/10.1002/jae.2315...
, which is a prior distribution that transforms the VAR model into a random walk process for each variable.

Figure 1 shows a sudden increase in the number of hospitalizations at the HERP in EW 31, which is accompanied by a decrease in the number of hospitalizations in the other units with the most beds. This decrease is not so clear in the corresponding graphs because the cumulative number of hospitalizations is relatively large in these units, but this situation describes the possible reallocations of patients between the units of HCFMRP. In this way, when considering interrelations between time-series variables, the BVAR model can provide an effective method to cope with these effects.

FIGURE 1:
Cumulative number of COVID-19 cases in each health unit of the HCFMRP, and fitted values obtained from the BVAR model based on lags of length p = 1 and p = 5. HCFMRP: Clinical Hospital of the Ribeirão Preto Medical School; BVAR: Bayesian vector autoregressive; HEAB: State Hospital of Américo Brasiliense; HERP: State Hospital of Ribeirão Preto; HES: State Hospital of Serrana; MATER: Reference Center in Women's Health; UE: Emergency Unity.

Holt's method is a special case of the Holt-Winters exponential model in which seasonality is absent. In the equation for Holt’s method, the forecasted value of the series at time t is given by

Y ^ t = L t - 1 + T t - 1 ,

where L t is the estimated level given by

L t = α Y t + 1 - α L t - 1 + T t - 1 ,

Tt is the estimated slope given by

T t = β L t - L t - 1 + 1 - β T t - 1 ,

and α and β are smoothing parameters1010. Brockwell PJ, Davis RA. Introduction to Time Series and Forecasting. 3rd ed. New York: Springer; 2016.,1111. Martinez EZ, Aragon DC, Nunes AA. Short-term forecasting of daily COVID-19 cases in Brazil by using the Holt’s model. Rev Soc Bras Med Trop . 2020;53:e20200283. doi: 10.1590/0037-8682-0283-2020.
https://doi.org/10.1590/0037-8682-0283-2...
. Holt's model can be applied by using the “holt” function included in the R package “forecast”.

We then assessed the feasibility of using BVAR and Holt’s models to obtain short-term forecasts for weekly COVID-19 cases in each health unit. Given that Holt's method is a univariate approach, we adjusted independent models for each health unit. We considered weekly reports from the date on which the first case was notified in one of the units of HCFMRP up to EW 37 as the training period. The values of the validation period were the corresponding observations from EW 38-41. Comparisons between the forecasted and actual values were based on the mean absolute percent error (MAPE), and Theil's U entropy coefficient was used as a measure of out-of-sample forecasting accuracy1212. Theil H. Applied economic forecasting. 1st ed. Amsterdam: North-Holland Publishing Company; 1971.. The lower the MAPE value, the better the performance of the corresponding model. When Theil's coefficient is greater than 1, the forecasts under consideration are less accurate than those obtained by a naïve approach, or a simple method in which the forecasts are equal to the last observed value.

Table 1 shows the weekly observed COVID-19 cases and the corresponding forecasts obtained from the BVAR and Holt’s models (with 95% prediction intervals) in the validation period. Table 2 shows the MAPE and Theil's U values considering forecasts of length h = 3 and h = 4 weeks. The BVAR model appears to perform better in terms of MAPE in some health units, such as Campus and HES, while the univariate Holt’s model appears to perform better in other units. However, in a general manner, both models are inefficient in making forecasts over periods longer than 2 weeks, given that Theil's U values are higher than 1 for some health units considering forecasts of length h = 3 or higher (Table 2). In the context of our study, this allows us to conclude that both BVAR and Holt’s models are useful for predicting the future number of confirmed cases of COVID-19 in the week following the end of an observed time-series, but forecasts for a more distant period can be very inaccurate.

TABLE 1:
Weekly COVID-19 cases and the corresponding forecasts from the BVAR and Holt’s models (with 95% prediction intervals), from epidemiological weeks 38-41.

TABLE 2:
Mean absolute percent errors (MAPE) and Theil's U entropy coefficients for forecasting weekly COVID-19 cases in each health unit of the HCFMRP, considering forecasts of length h = 3 and h = 4.

Even in a hospital context, the dynamics of COVID-19 can change very quickly during the time course of disease in response to a large number of factors, including changes in mitigation strategies for transmission in the community and people's adherence to them, and the availability of tests for essential screening1313. Coroiu A, Moran C, Campbell T, Geller AC. Barriers and facilitators of adherence to social distancing recommendations during COVID-19 among a large international sample of adults. PloS One. 2020;15(10):e0239795. doi: 10.1371/journal.pone.0239795.
https://doi.org/10.1371/journal.pone.023...
. The future of an epidemic in a population during its course is thus hard to predict, and mathematical and statistical models are only capable of simulating what can happen in the future if the conditions observed in the present do not change1414. Jewell NP, Lewnard JA, Jewell BL. Predictive mathematical models of the COVID-19 pandemic: underlying principles and value of projections. JAMA. 2020;323(19):1893-4. doi: 10.1001/jama.2020.6585.
https://doi.org/10.1001/jama.2020.6585...
. In the present study, we have data reported in weeks, and a period of two weeks or more can be large enough to produce substantial changes in exogenous variables that affect the dynamics of the disease. Although important, these limitations do not reduce the usefulness of the study in obtaining very short-term forecasts for confirmed cases of COVID-19, since out-of-sample predictions for the next one or two weeks can provide useful information for healthcare planning and distribution of resources. In economics, very short-term forecasting is sometimes called nowcasting (a contraction of "now" and "forecasting"), where forecasts for a very near future are useful to monitor changes in variables of interest in real time1515. Giannone D, Reichlin L, Small D. Nowcasting: The real-time informational content of macroeconomic data. J Monet Econ. 2008;55(4):665-76. doi: 10.1016/j.jmoneco.2008.05.010.
https://doi.org/10.1016/j.jmoneco.2008.0...
. In the context of the HCFMRP, accurate short-term forecasts can create alerts for situations of peak demand for hospitalizations during the pandemic and can help health managers optimize the costs of supplies and staff. Our results suggest that the BVAR model is a feasible tool to obtain these short-term forecasts. In addition, the “BVAR” package in R offers researchers with programming knowledge a powerful tool for the study of multivariate time-series in epidemiology.

The limitations of the present study include uncertainties in the diagnostic accuracy of the tests used to determine the COVID-19 status and, from a statistical perspective, the fact that the time-series are relatively short and the weekly case counts were sometimes small, which can make inferences difficult. Despite these shortcomings, both BVAR and Holt’s models offered reasonable performances in very short-term forecasts for confirmed cases of COVID-19 and can be valuable tools for disease surveillance.

REFERENCES

  • 1
    Martinez EZ, Aragon DC, Nunes AA. Long-term forecasts of the COVID-19 epidemic: a dangerous idea. Rev Soc Bras Med Trop 2020;53:e20200481. doi: 10.1590/0037-8682-0481-2020.
    » https://doi.org/10.1590/0037-8682-0481-2020
  • 2
    Enders W. Applied Econometric Time Series. 3rd ed. New York: John Wiley & Sons; 2010.
  • 3
    Oliveira JF, Rodrigues MS, Skalinski LM, Santos AE, Costa LC, Cardim LL, et al. Interdependence between confirmed and discarded cases of dengue, chikungunya and Zika viruses in Brazil: A multivariate time-series analysis. PloS One 2020; 15(2):e0228347. doi: 10.1371/journal.pone.0228347.
    » https://doi.org/10.1371/journal.pone.0228347
  • 4
    Khan F, Saeed A, Ali S. Modelling and forecasting of new cases, deaths and recover cases of COVID-19 by using Vector Autoregressive model in Pakistan. Chaos Solitons Fractals. 2020; 140:110189. doi: 10.1016/j.chaos.2020.110189.
    » https://doi.org/10.1016/j.chaos.2020.110189
  • 5
    Brooks C. Introductory Econometrics for Finance. 3rd ed. Cambridge: Cambridge University Press; 2014.
  • 6
    Holden K. Vector auto regression modeling and forecasting. J Forecast. 1995; 14(3):159-66. doi:10.1002/for.3980140302.
    » https://doi.org/10.1002/for.3980140302
  • 7
    Sims CA, Stock JH, Watson MW. Inference in linear time series models with some unit roots. Econometrica 1990; 58(1):113-44. doi:10.2307/2938337.
    » https://doi.org/10.2307/2938337
  • 8
    Gupta R, Kotzé K. The role of oil prices in the forecasts of South African interest rates: A Bayesian approach. Energy Econ. 2017;61:270-8. doi: 10.1016/j.eneco.2016.11.017.
    » https://doi.org/10.1016/j.eneco.2016.11.017
  • 9
    Carriero A, Clark TE, Marcellino M. Bayesian VARs: specification choices and forecast accuracy. J Appl Econometrics. 2015;30(1):46-73. doi: 10.1002/jae.2315.
    » https://doi.org/10.1002/jae.2315
  • 10
    Brockwell PJ, Davis RA. Introduction to Time Series and Forecasting. 3rd ed. New York: Springer; 2016.
  • 11
    Martinez EZ, Aragon DC, Nunes AA. Short-term forecasting of daily COVID-19 cases in Brazil by using the Holt’s model. Rev Soc Bras Med Trop . 2020;53:e20200283. doi: 10.1590/0037-8682-0283-2020.
    » https://doi.org/10.1590/0037-8682-0283-2020
  • 12
    Theil H. Applied economic forecasting. 1st ed. Amsterdam: North-Holland Publishing Company; 1971.
  • 13
    Coroiu A, Moran C, Campbell T, Geller AC. Barriers and facilitators of adherence to social distancing recommendations during COVID-19 among a large international sample of adults. PloS One. 2020;15(10):e0239795. doi: 10.1371/journal.pone.0239795.
    » https://doi.org/10.1371/journal.pone.0239795
  • 14
    Jewell NP, Lewnard JA, Jewell BL. Predictive mathematical models of the COVID-19 pandemic: underlying principles and value of projections. JAMA. 2020;323(19):1893-4. doi: 10.1001/jama.2020.6585.
    » https://doi.org/10.1001/jama.2020.6585
  • 15
    Giannone D, Reichlin L, Small D. Nowcasting: The real-time informational content of macroeconomic data. J Monet Econ. 2008;55(4):665-76. doi: 10.1016/j.jmoneco.2008.05.010.
    » https://doi.org/10.1016/j.jmoneco.2008.05.010

  • Financial Support: This study was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, process 312161/2019-0).

Publication Dates

  • Publication in this collection
    10 Feb 2021
  • Date of issue
    2021

History

  • Received
    24 Oct 2020
  • Accepted
    12 Jan 2021
Sociedade Brasileira de Medicina Tropical - SBMT Caixa Postal 118, 38001-970 Uberaba MG Brazil, Tel.: +55 34 3318-5255 / +55 34 3318-5636/ +55 34 3318-5287, http://rsbmt.org.br/ - Uberaba - MG - Brazil
E-mail: rsbmt@uftm.edu.br