Worldwide COVID-19 spreading explained: traveling numbers as a primary driver for the pandemic

The spread of SARS-CoV-2 and the distribution of cases worldwide followed no clear biogeographic, climatic, or cultural trend. Conversely, the internationally busiest cities in all countries tended to be the hardest hit, suggesting a basic, mathematically neutral pattern of the new coronavirus early dissemination. We tested whether the number of flight passengers per time and the number of international frontiers could explain the number of cases of COVID-19 worldwide by a stepwise regression. Analysis were taken by 22 May 2020, a period when one would claim that early patterns of the pandemic establishment were still detectable, despite of community transmission in various places. The number of passengers arriving in a country and the number of international borders explained significantly 49% of the variance in the distribution of the number of cases of COVID-19, and number of passengers explained significantly 14.2% of data variance for cases per million inhabitants. Ecological neutral theory may explain a considerable part of the early distribution of SARS-CoV-2 and should be taken into consideration to define preventive international actions before a next pandemic.

INTRODUCTION SARS-CoV-2 pandemic has spread around the world, but the patterns of dissemination, number of cases and deaths per country, as well as demographic trends, are challenging to be understood. The intensity of the outbreak among countries and continents cannot be fully explained by the management of the disease locally, neither by demographic traits alone (Ferguson et al. 2020). Some studies have suggested Asian and black communities are more vulnerable, as well as males, despite comorbidities, but such vulnerabilities have not defined any ethnogeographic pattern so far (Laurencin & McClinton 2020) or even genetic patterns. Neither has the culture, as the countries considered among those with more rigid hygienic habits (based on many times washing hand per day) are evenly distributed from the hardest to the lightest hit countries, such as Brazil and Germany among the top infected, and Australia and Japan mildly infected (UNICEF & WHO 2018). According to UNICEF & WHO (2018), the countries in which people wash their hands less are China and Malaysia, and the latter had as many cases as Australia in the early months of the pandemic (UNICEF & WHO 2018). Along with the habit of washing hands, oriental countries used to previous virus outbreaks have already developed a proper and correct practice of mask usage, eventually affecting the patterns of local transmission in comparison to western countries. Finally, there would also be climatic issues, such as effects of temperature or humidity (Coelho et al. 2020, Pequeno et al. 2020, Sajadi et al. 2020, and pollution, which changes how successfully the virus spread (Ogen 2020). Nevertheless, such environmental conditions are usually similar to broad regions, and, thus, it is hard to explain based solely on them why side by side countries with similar climates and biomes are so distinctly affected, as, for instance, Iraq and Iran, or nearby cities such as New York (U.S.A.) and Toronto (Canada).
There are many hypotheses on the global pattern of distribution of COVID-19, ranging from population age, environment, and governmental reactions; however, none of them are entirely conclusive (Fathizadeh et al. 2020, Ferguson et al. 2020. Besides, even for modelling in "war times" (sensu Vespignani et al. 2020) implies in uncertainties and constant reviewing. We are experiencing an epidemiologically complex event, where several of the aforementioned hypotheses need to be evaluated together. Despite the experience of other previous pandemic and large epidemic events, SARS-Cov-2 is unique in many aspects of its natural history, about which we are getting to know in the course of the pandemic. The unpredictability of the starting conditions of outbreaks at a local scale, and the role of stochastic situations typical of an early host invasion, such as super spreading events, increase the uncertainties within each affected human population and its epidemiologic dynamics (Vespignani et al. 2020). However, at the larger scale of disease dissemination across the globe, the typical pattern of having the wealthiest and busiest cities hit harder and first, may provide the clue for an emergent property of the disease, which may have the most straightforward explanation: it hits harder where it hits more! Two recent models based on the air transportation network of two large countries, Brazil  and Mexico (Dattilo et al. 2020), have clearly shown that centrality in the network makes a city more vulnerable than another spatially close, but less central in the network (Coelho et al. 2020). This prediction was confirmed with the outbreak in Fortaleza, Rio de Janeiro, and, most hardly, São Paulo, in Brazil, as well as Mexico City and Tijuana, in Mexico.
Furthermore, for Brazil, the aforementioned authors predicted and alerted the government about an unexpected case, the city of Manaus, one of the hubs of the Amazonian region, due to the lousy hospital and health care conditions ). In the early stage of the pandemic, Manaus was listed among the worst Brazilian cities in terms of incidence and mortality per 100 thousand inhabitants, along with other Amazonian cities (G1 2020). Manaus is an essential regional clustering for passengers in the Amazon region, receiving and distributing most people coming from the South and Southeastern regions of the country. Big cities and capitals that are only intermediate hubs or peripherals in the Brazilian air transportation network, such as Belo Horizonte and Curitiba, were spare from an accentuated number of cases or fast starting infection rate. Such slow outbreak start may, eventually, have helped local authorities who applied sufficiently severe quarantine impositions.
Other examples of distinct international scenarios, inside a similar biogeographic region and between countries of similar ethnic and cultural basis, are the significant number of cases in Iran (10 th worst number of cases) and very few in Iraq (position 67 in COVID-19 cases), or, similarly, for Dominican Republic (position 42 in cases) versus Haiti (position 116 in cases; ranking of cases collected from wordometers.info/ coronavirus/on May 22, 2020). These contrasting situations of biogeographically similar countries also can be explained by exposition to migrants/ visitors. The conflictive Iraq, which had 7,382,934 arriving flight passengers in 2018, had 2.6 times fewer visitors than Iran, with 19,403,070. Likewise, the poor and earthquake severely injured Haiti is the 7 th least visited country in the world, receiving nearly eight times fewer flight passengers than the neighbouring touristic Dominican Republic (IATA 2019).
The transmission of SARS-CoV-2 is mainly caused by human contact, regardless of the presence of symptoms (Fathizadeh et al. 2020). Thus, the more people arriving from anywhere previously infect, the higher is the chance of reaching the point of community transmission , Dattilo et al. 2020. Hence, despite various national contrasting authority acts against the pandemics, demographic trends, culture, or ethnobehaviours that affect how safe the citizens are inside a city, the mere rates of migration from contaminated regions could explain per se the number of cases after a few months from the first case. Hence, the hypothesis is that international travelling may be a main driver for the pandemic early distribution, with the prediction that countries which receive more visitors from abroad in a same interval of time, will suffer more severely the impacts of pandemic transmission.

MATERIALS AND METHODS
In order to investigate this hypothesis, we tested the total number of cases against the explanatory factor "number of passengers" per country, summed up with the number of national passengers (as passengers per 1 million; and, thus, correcting for the largest countries). The moment chosen for ranking the number of cases per country was May 22 nd , when most of the countries had had the first COVID-19 case for more than one month. Furthermore, the total cases and deaths were highly correlated, for absolute values or for cases per 1 million (Pearson value 0.85 and 0.70, respectively). Regardless of the lower precision of the total number of cases compared to death cases, we chose to analyse the former attribute to cover a broad range of countries, since many of them still had a negligible amount of deaths by the time we chose to study. Finally, as the error related to total cases will be mainly due to subnotification (and as there may be similar levels of underreporting expected for most of the countries worldwide), we assumed that by using the logarithm scale, we reduced the error variance sufficiently to compare records between countries and, thus, made the official cases values a proxy to the actual pattern of cases distribution.
As the simple preliminary regressions (Figures 1 and 2) showed that most cases below the regression curve were islands or countries with few international borders, such as Portugal, we included, in a stepwise regression, the factor "number of international borders" to add a proxy for the immigration by land. Subsequently, we repeated the analysis using the number of cases per 1 million people and compared the regression residuals to detect the relevance of population size in affecting the disease distribution. As the number of cases correlates with population size (r=0.57), hardly the latter could be tested directly as an explanatory factor. However, the role of the number of people per se can be subtracted from the analysis by comparing the coefficient of determination and residuals between regressions with the absolute and relative number of cases. Data were taken from the worldmeters site (Worldometer 2020; Supplementary Material - Table SI). Regardless of the fact that cases are strongly amplified due to local community transmission, we chose a date when the pandemic might have already consolidated worldwide. Even with local transmission varying between places, it seems that dissemination by travelling is indeed highly correlated with and sensitive to the early phase of exponential growth of the disease , Coelho et al. 2020, Dattilo et al. 2020.
For the number of passengers arriving in a country, we used the IATA´s WATS: World Air Transportation Statistics 2019 report (IATA 2019), which provides as the latest figures the number of flights worldwide in 2018 and, thus, a period of time that reflects well the flight frequencies between countries before the declaration of the pandemic (Table SI). Conversely to Coelho et al. (2020), we were not interested in the air transportation network or modular connectivity effects, but in the effect of the number of travellers reaching each country per period, and therefore, a proxy of viral load pressure on the population. Because SARS-CoV-2 may have started spreading much before countries decided to slow or close down airports , Wu et al. 2020, we assumed the number of passengers before the crisis as the best estimate of exposition to traveling load.

RESULTS
As expected, countries receiving more flights were those most densely populated, regardless of GDP (such as China, India, United States, Indonesia), most developed ones (Japan, top European Countries, Canada, Australia, and United States) or among the 20 th largest economies (previous list plus Brazil, South Korea, Mexico). We found that, both the number of passengers arriving in a country and the number of international borders, explained significantly the number of cases of COVID-19 on May 22, 2020. These combined variables had irrelevant collinearity (tested by VIF, taken VIF<2.5), thus were complementary explanations, covering 49.9% of the data variance ( Fig. 1; Table I).
The response of the number of cases corrected by country population size was also significant but explained a lower percentage of the data variance. In addition, at this scale, the number of terrestrial borders were not significant in the model ( Fig. 2; Table II). On the other hand, the residual analysis showed a more uniform structure for the relative number than for the absolute number of cases (only four atypical cases against 12 for the first model), which corroborates the hypothesis that population size absorbed a substantial part of the data variance, given that the number of confirmed infected cases is still a tiny proportion of the total population of any country. The adjustment herein produced separated the countries highly connected by traveling in a densely populated group (China, India, Japan, and Indonesia) below the model line, and a western (geographically speaking) and developed group, above the model line. The pattern in evidence, by comparing these two models, is that highly connected countries contributed similarly to the early spreading of COVID-19, but with contrasting impact on each country demography and structure. Additionally, European and American profoundly affected countries (the USA in evidence) were proportionally more affected than oriental countries (Fig. 2).

DISCUSSION
Our results demonstrated how the international traffic of people may have influenced the transmission of SARS-CoV-2 between countries and contributed to the establishment of the  pandemic. These results are important to reinforce the containment measures proposed in the RSI2005, considering the world panorama with the current pandemic. Far from denying the functional effects of quarantine or lockdowns, our results showed that a more strict entrance control, if internationally coordinated early in the current year, could had greatly reduced the level and speed of SAR-COV-2 dissemination, by deaccelerating the rates of transmission through the borders and ports. As this did not happen, it is evident that those countries or States, where local government acted faster and ahead before the moment when uncontrolled community transmission was in place, managed to prevent a too stiff case curve (Wu et al. 2020). Moreover, the disproportionately high amount of cases of COVID-19 in European and American large and highly connected countries are directly associated with the fact that their cities receive more flights, and, therefore, are central hubs in the international flight network (Coelho et al. 2020). These highly connected cities were those that needed to block airports first. As they did not, cities such as Milan, Madrid, New York, London, Paris, São Paulo, and Rio de Janeiro were hit harder. Considering the further flight network leaving from these central hub-cities nationally, it would be essential to have them detected early to prevent each infected city from becoming an internal source of the disease, as was the case for Brazil. By locking down only those most important airports in the flight transportation network, a government could manage a more efficient and less economically damaging airport sanitarian control ).

CONCLUSIONS
In conclusion, we found that one of the likely most straightforward explanations for country vulnerability is how well connected it is internationally in the early stages of a pandemic, and prior a severe local community transmission to be in place, both due to the number of flight passengers and to the number of international borders. Closing the borders took time to be accepted worldwide, as well as the COVID-19 pandemic traveling consequences have been neglected in the literature. It seems like most of the studies appearing in preprint repositories or already published, at the epidemiologic level, are concerned with the management of community transmission in already established outbreaks.
Herein, we considered one aspect of the virus basic ecology, i.e., the mechanism of dissemination, which is fundamental aspect both to understand its early spreading as well its zoonotic origin and spillover, and that might request more in-depth studies, as shown by Zhao et al. (2020). Despite human cultural and civilization complexity, for the virus, we are just an abundant host, and the intensity and how we connect populations by travelling/ migration is a basic biologic component of the viral species dissemination worldwide (Allen et al. 2017, Johnson et al. 2019. Ecological theory has shown that species dispersion follows basic, neutral mathematical rules (sensu Hubbell 2001). Considering a country as a habitat to be invaded, the larger the border, or the closer the habitat is to the source, the more virus load it will receive (Macarthur & Wilson 1967). Based on this simple principle, and on WHO´s RSI2005 guidelines (that should have been severely observed), countries could have produced customized protocols for early warning of airports along a gradient of risk. By applying neutral theory principles to virus dissemination, one can understand in an uncomplicated way the beginning of a new disease outbreak, which may be a key aspect in stopping future pandemic events, mostly because essential details on the infectious disease biology are still unknown when a pathogen spillover towards human kind (Andersen et al. 2020;Vespignani et al. 2020). In conclusion, the pandemic was strongly sensitive to the initial conditions of the virus dissemination, prior complex social, political, and anthropologic components of community transmission take place. Sadly enough, essential details for proper predictive mapping of infectious diseases, such as human distribution and vulnerability, or disease-relevant environmental covariates, are still not sufficiently studied (Hay et al. 2013). Assuming there will be a future zoonotic virus pandemic threat, the world could coordinately create a strategy for fast and efficient blocking of any traveling infectious disease. Therefore, it would require humans to develop robust governance, monitoring, and knowledge sharing at the international level.