Mapping the global geographic potential of Zika virus spread

The Americas are presently experiencing the most serious known outbreak of Zika virus (ZIKV). Here, we present a novel set of analyses using environmental characteristics, vector mosquito distributions, and socioeconomic risk factors to develop the first map to detail global ZIKV transmission risk in multiple dimensions based on ecological niche models. Our model predictions were tested against independent evaluation data sets, and all models had predictive ability significantly better than random expectations. The study addresses urgent knowledge gaps regarding (1) the potential geographic scope of the current ZIKV epidemic, (2) the global potential for spread of ZIKV, and (3) drivers of ZIKV transmission. Our analysis of potential drivers of ZIKV distributions globally identified areas vulnerable in terms of some drivers, but not for others. The results of these analyses can guide regional education and preparedness efforts, such that medical personnel will be better prepared for diagnosis of potential ZIKV cases as they appear.

Ecological niche modeling -We approximated the ZIKV fundamental ecological niche via ecological niche modeling (ENM) via the maximum entropy algorithm implemented in Maxent, version 3.3 (Phillips et al. 2006). Peterson et al. (2011) defined the fundamental niche as "the set of environmental conditions required for the species to maintain populations without immigrational subsidy". ENM relates known occurrences of species to the set of environmental variables in a maximum entropy, evolutionary-computing environment to approximate this set of environmental conditions associated with maintenance of populations (Peterson et al. 2011).
We calibrated ENMs across Mexico, Central America and South America, where ZIKV occurrence data were sufficiently dense for rigorous model calibration (Owens et al. 2013); models were then projected worldwide for interpretation. To explore and understand contributions of different suites of variables to shaping the distribution of ZIKV, we used different combinations of environmental variables; socioeconomic variables and accessibility (see Neerinckx et al. 2008). These explorations illuminate the roles of possible drivers of ZIKV transmission beyond just climate (Kilpatrick & Randolph 2012, Weaver 2013. A full elaboration of combinations of drivers that we explored is presented in Supplementary Table. For each combination, we ran Maxent using 100 bootstrap replicates. The median of the outputs of the replicates was used as a best final estimate in subsequent analyses. Final models were thresholded based on a maximum allowable omission error rate of 5% [E = 5%; (Peterson et al. 2008)], in effect assuming that < 5% of occurrence data would have sufficient error in geolocation that variable values might be misrepresented.
Model predictions were evaluated for statistical significance based on predictions among random subsets of 50% of available data. In effect, these tests assess whether a model based on a certain amount of occurrence information across a region will be able to anticipate the next set of occurrences across that same landscape; while not an ideal test of predictive ability when models are transferred to other regions, it is at present the only test available to us. We used partial receiver operating characteristic (ROC) statistics (Peterson et al. 2008), which avoid well-known problems with traditional ROC approaches (Lobo et al. 2008). Partial ROC statistics were calculated using the PartialROC function in the ENMGadgets package in R software version 3.2.0 (R Development Core Team 2015), specifying the same E = 5%, a 50% bootstrap resampling, and 100 random iterations (Lobo et al. 2008).
For visualisation, we combined two of these thresholded models (Model 3 and Model 4) to illustrate differences between predictions based on different possible drivers of ZIKV transmission. Interesting contrasts emerge from differences in suitability based on all of the environmental dimensions, areas identified as suitable based on climate and presence of vector species, and suitability in terms of human socioeconomic variables and accessibility, as is exemplified in the Figure in the main paper. GIS-readable grids (GeoTIFF format) and Google Earth keyhole markup language (KML) files are available via Figshare (https://figshare.com/s/0257ff447ccc11373e41). Population density 2015

SUPPLEMENTARY
Check marks indicate that the variable was used in the model; X's: indicate variables that were not used in the model. Supplementary figure 6: potential geographic distribution of ZIKV based on environmental variables only (Model 1) in relation to known global distribution of the virus. Note that models were calibrated based on occurrences in Mexico, Central America, and South America, such that occurrences in Africa, Asia, and the Pacific had no contribution to model calibration. Countries reported with autochthonous cases confirmed by isolation or PCR (orange shading with brown boundaries) and countries with known seropositive cases (red stippled areas and light-blue shading) are shown in relation to the model predictions (in blue, and purple and light blue where overlapped by the known-positive cases. Note the generally close correspondence between the global projection of our models and countries where ZIKV has been detected via virus isolation, PCR, or serological studies.

Supplementary data
Supplementary figure 7: close-up of North America to provide detail additional to the figure in the main paper. Orange areas were identified as suitable based on drivers related to physical environment and vector populations; purple areas were identified as suitable based on drivers related to human conditions and accessibility; blue areas were identified as suitable in terms of all drivers considered.
Supplementary figure 8: close-up of Europe to provide detail additional to the figure in the main paper. Orange areas were identified as suitable based on drivers related to physical environment and vector populations; purple areas were identified as suitable based on drivers related to human conditions and accessibility; blue areas were identified as suitable in terms of all drivers considered.
Supplementary figure 9: close-up of Asia and Australia to provide detail additional to the figure in the main paper. Orange areas were identified as suitable based on drivers related to physical environment and vector populations; purple areas were identified as suitable based on drivers related to human conditions and accessibility; blue areas were identified as suitable in terms of all drivers considered.
Supplementary figure 10: close-up of South America to provide detail additional to the figure in the main paper. Orange areas were identified as suitable based on drivers related to physical environment and vector populations; purple areas were identified as suitable based on drivers related to human conditions and accessibility; blue areas were identified as suitable in terms of all drivers considered.