Using distribution models to estimate blooms of phytosanitary cyanobacteria in Brazil Using distribution models to estimate blooms of

: The multiple uses of aquatic ecosystems by humankind and the continuous interference of their activities have contributed to the emergence of potentially toxic cyanobacteria blooms. Here, we firstly created a database of occurrences of cyanobacteria blooms in Brazil through a systematic review of the scientific literature available in online platforms (e.g. Web of Science, Capes Thesis Catalogue). Secondly, we carried out ecological niche models with occurrence data obtained from these studies to predict climatically suitable areas for blooms. We select 21 bioclimatic variables input environmental data. We used five modeling methods for the current climate scenario: (1) Maxent; (2) Support Vector Machines; (3) Random Forest; (4) Maximum Likelihood e (5) Gaussian. We found that the number of publications about bloom events was higher in 2009 with a decline in the years 2012, 2013 and 2017. Furthermore, the years with the higher records of blooms in freshwater environments were 2005, 2011 e 2014. These events occurring mainly in public supply reservoirs and are mostly of the genera Microcystis Lemmermann, 1907, Dolichospermum (Ralfs ex Bornet & Flahault) P.Wacklin, L.Hoffmann & J.Komárek, 2009 and Raphidiopsis F.E.Fritsch & F.Rich, 1929. Modeling the potential distribution of blooms, we found sampling gaps that should be targeting for future researches, especially in the Amazon biome. Overall, the models did not predict highly suitable areas in the /north of Brazil, while other regions were relatively well distributed with a higher number of occurrence records in the Southeast region.


Introduction
Freshwater ecosystems sustain much of Earth's biodiversity, providing multiple products and ecological services to humankind (Laurance et al. 2014). However, these ecosystems are suffering from several kinds of human pressures, such as changes in use and land cover (Huisman et al. 2018). Inadequate use of natural resources, as well as the different process of overexploitation, pollution, eutrophication, dam construction and silting, has intensified over recent decades (Hannah et al. 2013), causing negative impacts on the environment and the health of human populations (Green et al. 2015). Specifically, the broad degradation has generated a loss of species and habitats, threatening several biological communities of rivers, lakes and flood plains (Vitule et al. 2017).
Overgrowth of algae, especially cyanobacteria, is one of the problems in aquatic environments (Walls et al., 2018). Results from the interaction of physical, chemical and biotic factors (Behrenfeld & Boss, 2017), which is marked mainly by increased cyanobacterial density broadly geographically distributed, and that respond rapidly to environmental changes in aquatic environments (Padisák et al. 2016), such as light intensity, CO 2 accessibility, high pH and low N:P ratio (Genuário et al. 2016). The growth of harmful cyanobacteria in high densities, known as water-blooms or blooms (Paerl & Otten 2013), produces a variety of cyanotoxins (Kosten et al. 2012) that may cause liver, digestive and neurological diseases when ingested by birds or mammals (Mantzouki et al. 2018). Furthermore, they directly affect water quality, producing taste and odor, increased turbidity, decreased submerged aquatic vegetation (Merel et al. 2013), decreasing, and promoting the death of fish and benthic invertebrates (Josué et al. 2018). The problems tend to increase with the abundance, frequency, and extent of blooms (O'Neil et al. 2012). Thus, it is of great importance to know what determines the occurrence of these events in aquatic environments (Glibert et al. 2008).
Patterns that determine the spatial and temporal distributions of these organisms are environmental characteristics (Hernández-Fariñas et al. 2014), biological interactions and dispersal capacity of the species (Soberón & Peterson 2005, Soberón 2007). These species act as "refined sensors of environmental properties" because they respond quickly to variations in the availability of environmental resources (Mekonnen & Hoekstra 2018). However, lack of knowledge about the global distribution and abundance of algae restricts our ability to understand the mechanisms that determine its distribution (Flombaum et al. 2013). In general, geographic distributions are poorly known and have numerous information gaps (Moreira et al. 2013). The Wallacean shortfall refers to inadequate knowledge about the species' geographic distribution (Whittaker et al. 2005, Hortal et al. 2015 and is a constraint particularly important to improve understanding about cyanobacterial blooms events (Hortal et al. 2015). The predominance of studies close to traditional research centers (Newbold 2010) or uneven spatial distribution of infrastructure (Oliveira et al. 2016) may also generate a geographic bias about known cyanobacterial bloom events (Sastre & Lobo 2009).
To reduce the Wallacean shortfall it is necessary to find the potential geographic gaps of these organisms and fill them in. In this context, scientometrics studies can contribute significantly to a general analysis of the patterns found. This research method allows us to measure the available data on species geographic distribution and to assess existing citations (Carneiro et al. 2008), revealing trends and gaps in scientific production (Debackere et al. 2002).
Additionally, scientometric analyzes can identify gaps in Cyanobacterial geographic distribution, helping to formulate new hypotheses about the mechanisms that determine these distributions. Thereby, it is possible to use Ecological Niche Models (hereafter ENMs) to fill these gaps (Jensen et al. 2017). ENMs are statistical procedures that use species occurrence records to estimate suitable areas through environmental similarity between different sites (Peterson 2017).These models assume the premise that species' ecological niche is fully known and never changes over time, being completely dependent on the amount observed and the distribution pattern of the occurrence records (Peterson 2011). Therefore, it is possible to estimate new environmentally similar areas for species to occur. These models are widely used to (i) define potential distributions (Flombaum et al. 2013); (ii) indicate suitable areas for future sampling (Jensen et al. 2017); (iii) test biogeographic and evolutionary hypotheses (Silva et al. 2014); (iv) suggest the establishment of conservation units (Loyola et al. 2008, Nóbrega & De Marco 2011 determine how species respond to climate change (Barton et al. 2016, Oliveira et al. 2015. Assuming that the occurrence of toxic cyanobacterial blooms is a recurring problem in several freshwater reservoirs in many tropical countries and that these events make drinking water unfit for human consumption (Mowe et al. 2015), there is a strong interest in developing an ability to predict the occurrence of cyanobacterial blooms in freshwater environments. A major obstacle in attempting to reduce cyanobacterial growth events in freshwater ecosystems is a consequence of the lack of reliable data on the distribution of these species. From this perspective, we describe the number of publications collected in the scientific literature that documented the occurrence of cyanobacterial blooms over the years, as well as the responsible orders and the records of the Brazilian states where they occurred. Also, we used the occurrence data obtained in this scientometric investigation to estimate climatically suitable areas for the occurrence of cyanobacterial bloom events. For us, the pattern of the wide distribution of flowering events reflects the arrangement of their corresponding habitats, and the occurrence of these species is restricted to environments that correspond to specific adaptations of the species. As cyanobacterial bloom events are more common in lentic environments (e.g., reservoirs), we hypothesize that locations with higher intensity of use (e.g.: higher population density) and greater damming of rivers (e.g. reservoirs) will exhibit a higher frequency of blooms.

Database of blooms events in freshwater environments
We created our database of cyanobacteria blooms occurrences in Brazil through a systematic review of the scientific literature available in the platforms Web of Science (WoS, http://apps.isiknowledge.com) maintained by Clarivate Analytics and Capes Thesis Catalogue (http:// catalogodeteses.capes.gov.br), using the code of search: [("bloom*") AND ("Brasil" OR "Brazil") AND ("cyanobacteria" OR "cyanophyceae")] and "florações" (in Portuguese). The WoS database has the advantage of providing data on publications over a broad time, presenting detailed and accurate scientific articles data, and is widely used in systematic review articles (Falagas et al. 2007). The Capes Thesis Catalogue stores many dissertations and thesis published in Brazil, facilitating the compilation of blooms events that occurred in the Brazilian territory.
In both databases, we searched for articles and reviews that contained the search terms in the title, abstract and/or keywords (access date: May 22th, 2018). We established two criteria to select the occurrence records in the scientific articles: (1) cyanobacteria blooms classified according to the distribution of cells and individuals in the water column (accumulation of high concentrations of chlorophyll-a in the first centimeters of the water surface; accumulation of high concentrations of chlorophyll-a in water depth; and when the cells are dispersed in the water column); and (2) blooms according to the density of chlorophyll-a (minimum concentration of 10 μg/L -1 of chlorophyll-a; and minimum density of 20.000 cells/mL of cyanobacteria) (De León & Chalar 2003). In our search, we obtained 208 scientific articles in the WoS database, selected 98 studies after reading the abstracts and included 47 in our study. At the Capes Thesis Bank, we found 385 records. However, not all records were made available for reading. Thus, according to the established criteria, we were able to include only 18 studies in our database. We also included 10 studies cited in the bibliographic review of Freitas et al., (2012), in which they present a synthesis of blooms events in Brazil. Finally, we added two other scientific studies found in two Brazilian repositories (Universidade Nacional de Brasília e Universidade Federal de Goiás). Of the 77 papers found in the scientific literature, five papers were of the same area and the same species or were located in marine water environments, therefore they were not included.

Scientometric analysis
We compiled a list of 72 scientific studies that mention cyanobacterial blooms in freshwater environments in Brazil. The species names were updated following information from the On-line Database of Cyanobacterial Genera (CyanoDB.cz, http://www.cyanodb.cz/) (Komárek et al. 2014). We classified the occurrence records according to the distribution of the taxa at the collection sites. We corrected possible georeferencing errors considering several quality criteria (latitude and longitude exchange; occurrence records outside of the freshwater environments; and duplicate records) (Giovanni et al. 2012). When the latitude and longitude were incorrect, but there was information about the sampling and collecting site, we used Google Earth to get surrogate information. Each event record in a given location was considered as a sample. Bloom events that occurred in distinct months were considered as different records. To estimate the sampling effort in Brazil, we counted the number of bloom events in 1-degree cells.
Then, we elaborated on a map demonstrating the total number of bloom events distributed in Brazil. Also, we produced bar charts evidencing the number of blooms events and the number of scientific studies published per year. To identify whether there is a relation between the number of blooms and the number of scientific studies, we performed a simple linear regression analysis between those variables. We verified the residues normality and used a transformation (log+1) to meet that basic premise based on the protocol for data exploitation provided by (Zuur et al. 2010). To identify if there was a relationship between the number of flowering in freshwater environments and the population density, we performed a simple regression analysis between the variables using the premise mentioned above. For this, we consider the population density per municipality and the year for each point where the flowering event occurred.
For the collection of the population estimation data, we consider the information provided by the IBGE (https://cidades.ibge.gov.br/). We combined the same geographic coordinates, and then our N which was 90 points, resulted in 59 records of reports of bloom in freshwater environments.

Environmental variables
To produce the ENMs, we used the environmental variables obtained from WorldClim 1.4 (http://worldclim.org/current; Hijmans et al., 2005) and WorldClim 2.0 databases (http://worldclim.org/ version2; Fick and Hijmans, 2017). We select all 19 bioclimatic variables from WorldClim 1.4, average altitude and solar radiation from WorldClim 2.0 as input environmental data. These variables have a spatial resolution of 5 arc-minutes (≈10 km of cell size). We considered the variables already reported in other studies. For example, the temperature variable is often considered the most important determinant of growth and metabolism in freshwater algae, including cyanobacteria, due in part to the fact that many of the enzymatic reactions involved in photosynthesis and respiration are temperature dependent. Solar radiation is justified because it is an essential resource for photosynthesis since these organisms are autotrophic (Walls et al. 2018). We used altitude as a variable because it is highly related to weather variables (Teittinen et al. 2017). Finally, we chose the precipitation variable, in the rainy season the highest nutrient transport takes place for the aquatic ecosystems, being observed the increase in the density of cyanobacteria. Furthermore, climate variations can modify from the community structure in freshwater ecosystems. For instance, the cyanobacteria presence may be strongly influenced by physical factors, such as the local climate conditions (Karadžić et al. 2013). The extreme precipitation in a reservoir cause increased nutrients concentration and, then, altered the composition of the phytoplankton community by cyanobacteria, evidencing the first bloom events after the suppression of other species (Simić et al. 2017).
To reduce the multicollinearity of the data, we performed the Principal Component Analysis (PCA) (Pearson 1901). This method calculates the mean of all variables and subtracts from the individual values. Then, the resulting values are divided by the standard deviation of each variable (z transformation). Thus, the cells of all variables range from -1 to 1, with zero mean. Thereby, we produced 21 orthogonal principal components (independent) and selected the first seven, which accounted for 96.4% of the variation of the original dataset (Table 1). This method allows the variables to have the same importance in the ENM predictions (Dormann et al. 2012). Consequently, it also avoids the overfitting of the models, which can result in unreliable predictions (De Marco & Nóbrega 2018).

Modeling procedures
We performed the ENMs only for orders of cyanobacteria that had a minimum of 10 occurrence points. For Phylum Cyanobacteria, we compiled a total of 109 occurrence records, where 47 belonged to Chroococcales, 44 to Nostocalles, 12 to Oscillatoriales and 6 to Synechoccales. We also included a general model for Cyanobacteria to represent the order Synechoccales in our study. To ensure independence in the dataset used to fit and evaluate the performance of the models, we chose to use geographic partitions in a grid format, similar to a checkerboard (Muscarella et al. 2014).
This partition subdivides the study area equally and in a spatially independent manner, alternating between training (to perform the model) and testing (to evaluate the model). We used five ENM algorithms to model the distribution of bloom events: (1) Maximum Entropy (MXE) (Phillips et al. 2006); (2) Support Vector Machine (SVM) (Guo et al. 2005 The MXE algorithm is a technique of machine-learning that estimates the nearest probability distribution of the uniform distribution under constraint whose expected values for each variable are in agreement with empirical values observed in the occurrence records (Phillips et al. 2006). This technique constrains the possibilities of adjusting linear or quadratic functions, reducing the complexity of the models and producing better predictions in certain situations (Phillips et al. 2017(Phillips et al. , 2004. The SVM algorithm is a set of methods of supervised learning belonging to the family of generalized linear classifiers. This algorithm reduces the probability of misclassifying in patterns not observed by the distribution of data probabilities (Rangel & Loyola 2012). SVM creates hyperplanes to differentiate the occurrence records from absence sets (Guo et al. 2005). The RDF algorithm produces accurate predictions that do not overload data, fitting the models based on decision trees that use a subset of random predictors (Breiman 2001).
The MLK algorithm predicts the species occurrence probability in a given location by estimating a distribution of occurrence probability based on observed environmental conditions (Royle et al. 2012). The GAU algorithm predicts the species occurrence probability based on adjustments made by Bayesian inference (Golding & Purse 2016).
We used a method to create pseudo-absence to meet some algorithms' requirements. Here, we used bioclimatic envelopes similar to the BioClim algorithm (Booth et al. 2014). This procedure constraints the occurrence points of the taxa in the geographical space using a bioclimatic envelope (VanDerWal et al. 2009, Lobo & Tognelli 2011. Then, the external area is considered as not suitable for the occurrence of species. In this area, pseudo-absences are created in a ratio of 1:1. We used a threshold that maximizes the sum of the sensitivity and specificity obtained from the Receiver Operating Characteristic (ROC) curve. This method is given by the graphical representation of True Positive Rate and True Negative Rate in several threshold settings. We measured the performance of the ENM algorithms by True Skill Statistics (TSS; Allouche et al., 2006). TSS is a threshold-dependent metric and ranges from -1 to 1. Predicted distributions with negative values and close to zero are not considered better than random models. 'Acceptable' projections for potential species distributions generally reach TSS values close to 0.5. 'Good' projections reach TSS values close to 0.7, while 'excellent' projections reach close to 0.9. We represented the final distributions using consensus maps to reduce the uncertainties associated with each algorithm (Araújo & New 2007). We made the consensus maps using the average of the models that presented TSS values above the average. The idea of the consensus models considers that different errors may affect the final result (e.g. sensitivity of the models, lack of true absences). For this reason, it has been argued in the literature that the use of consensus maps as final distribution models may reduce the number of errors (Diniz Filho et al. 2010).

Scientometric analysis
We found 72 scientific studies in the literature that mention the occurrence of bloom events in freshwater environments in Brazil. We detected that the orders Chroococcales, Nostocales, Oscillatoriales, and Synechococcales were the most reported. Species of the orders Chrococcales and Nostocales, represented by the genera Microcytis, Raphidiopsis, and Dolichospermum (old Anabaena), occurred mainly in the states of São Paulo, Paraná, Rio Grande do Sul and Minas Gerais (Fig. 1). There were differences in the number of records distributed among Brazilian states. In some states we obtained a large number of blooms, while there were no records at all for others.
The highest numbers of blooms found in the literature were obtained in the states of São Paulo, Minas Gerais, Pernambuco, and Rio Grande do Norte. On the other hand, we found the smallest amounts of bloom registered in the literature in the Amazon hydrographic basin. Mainly, bloom events were reported in sites of high human concentrations and with public supply reservoirs: Acarape do Meio Reservoir (Ceará), Armando Ribeiro Gonçalves Reservoir and Cruzeta Reservoir (Rio Grande do Norte), Carpina Reservoir (Pernambuco), Billings and Guarapiranga Reservoir (São Paulo), Utinga Reservoir (Belém do Pará) and Juturnaíba Reservoir (Rio de Janeiro).
We observed the first bloom events in 1982 (n = 2) and the highest number of bloom records in 2010 (n = 89) ( Fig. 2A).  Furthermore, we observed that the years with the relationship number of blooms are not necessarily the years with the relationship numbers of studies. For instance, 2009 presented the highest number of scientific studies and a median number of bloom events. Then, we also observed that species of potentially toxic genera, such as Microcystis, Raphidiopsis, and Dolichospermum have a wide geographic distribution. The increase in publications during the years 2005, 2009 and 2014 indicates an increase in the number of researchers in this field of study, as well as its scientific and technological progress, considering that the number of publications is one of the most used measures to quantify the scientific production (Debackere et al. 2002). Between the years of 2010 and 2018, the publications did not exceed the number of five studies, demonstrating a small number of scientific studies mentioning the bloom occurrences. The lack of studies on the occurrence of cyanobacterial bloom events, as well as the concentration of records sampled in large cities and close to researches centers, were the main observed biases. The amount of published research over the years may indicate gaps to be filled in later studies since cyanobacteria are potentially toxin-producing organisms lethal to aquatic biota and humans. Our findings indicate that the occurrences are located where there is a greater human population density and in public supply reservoirs with historic of persistent blooms. What may justify the greatest number of events recorded in supply reservoirs is the existence of criteria related to the growth of cyanobacteria that are set out in the Ministry of Health Ordinance Nº. 2.914, dated December 12, 2011, and which, in turn, revoked the Ministry of Health Ordinance Nº. 518 of March 25, 2004. The federal law evidences the need to monitor cyanobacteria in all sources of public supply, thus contributing to the largest number of publications in supply reservoirs.
Since bloom events occur mainly in large urban centers favors the accumulation of pollutants and the accelerated growth of the phytoplankton community, causing a considerable increase in biomass (Behrenfeld & Boss 2018). This biomass has negative consequences on the efficiency and cost of water treatment, which can generate a loss of the resources destined to the public supply due to the economic unviability related to the water treatment (Lorenzi et al. 2018). The blooms were mostly of the genus Microcystis, which in turn provide great shading for the other phytoplankton species, hindering their development, reducing the competition rate and eventually reducing the richness and diversity of organisms (Cires et al. 2013). Also, morphological adaptations and the presence of gas vesicles allow buoyancy (van Gremberghe et al. 2011) and access to active photosynthetic radiation that facilitates its success in aquatic ecosystems (Padisák 1997). Yet, the dense mucilage in cyanobacteria (Reynolds 2007) ensures the increase in tolerance to high luminous intensities due to the acclimation by an increase in the production of photoprotective pigments (Paerl & Otten On the other hand, the year with the highest number of studies reporting blooms are 2009 (n = 9), 2005 and 2014 (n = 6), with the first study published in 1994 (n = 1) (Fig. 2B). We found no relationship between the number of blooms and the number the scientific studies (F = 3.231; R 2 = 0.076; p = 0.083). However, we must point out that since our database is composed of scientific studies and is not based on random sampling and probably does not have the same number of repetitions in each region, the sampling effort is an important factor in the frequency of occurrence of bloom. Thus, although the statistical relation has not been observed, it is difficult to consider that there is no relation between a number of blooms events and the numbers of scientific studies. We observed a relationship between the number of blooms in freshwater environments and population density (R² adjusted = 0.35; p = <0.001). Furthermore, we observed that the years with the relationship number of blooms are not necessarily the years with the relationship numbers of studies.

Cyanobacteria potential distributions
In general, TSS values obtained for the modeled taxa were considered acceptable (greater than 0.5) or excellent (greater than 0.7). TSS values for phylum Cyanobacteria (0.856) and the order Chrococcales (0.882) were the highest. For the orders Nostocales (0.743) and Oscilatoriales (0.657) the values indicate models with good adjustment (Table 2).
While the Phylum Cyanobacteria, the orders Chroococcales and Nostocales obtained wide potential distributions range among the Northeast, South, Southeast and Midwest regions, the order Oscillatoriales presented a restricted distribution between Northeast and Southeast regions (Fig. 3). Altogether, no model designed suitable areas in the Northern region, so that the distribution of the taxa was mainly concentrated in Southeastern Brazil. The prediction for the phylum Cyanobacteria showed that 52.5% of the Brazilian territory has highly suitable area for the occurrence of blooming events. The orders Chroococcales, Nostocalles and Oscillatoriales showed high suitability in 55.5%, 49.9% and 17.3% of the Brazilian territory.

Discussion
We observed that the number of publications on blooming events was higher in 2009, showing a decline in 2012, 2013 and 2017. However, the blooms have been reported in publications with data since 1982. Our results indicate the higher number of freshwater blooms in 2005, 2011 and 2014, and the vast majority of these records occurred in public supply reservoirs. We observed a relationship between the number of blooms in freshwater environments and population density (R² adjusted = 0.35; p = <0.001).  (Paerl & Huisman 2008). Also, the genera Raphidiopsis presents success in dispersal attributed, in large part, to its ability to tolerate journeys along with river courses (Rick et al. 2007, Moreira et al. 2015. In the genera Dolichospermum and Microcystis, the wind is an important dispersing agent for phytoplankton (Chrisostomou et al. 2009), as well as the animals that can also transport their vegetative forms on their body surface (Padisák et al. 2016). Cyanobacteria occur at environmentally suitable sites, where adequate dispersion rates are paramount for tracking changes in environmental conditions between localities (Heino et al. 2009).  Cyanobacterial dominance is associated with high temperatures, and the close relationship between temperature and the dominance in water bodies is evident (Cottingham et al. 2015). The use of ENMs can estimate environmentally suitable areas where the knowledge about cyanobacterial geographic distribution is incomplete (Silva et al. 2013); guiding future field surveys (Jensen et al. 2017). In an attempt to reduce the lack of knowledge about the geographic distribution of cyanobacteria responsible for bloom events, also known as Wallacean shortfall (Cardoso et al. 2011, Whittaker et al. 2005, the data compilation from specialized literature, becomes an effective tool to mitigate such a problem. However, it is necessary not only to record the collection biases but also to identify the priority areas for inventories to overcome this problem (Sousa-Baena et al. 2014).
Our ensemble distribution maps revealed that the northern portion of Brazil does not have high suitability for bloom events, being that the blooms are distributed in greater number of occurrences in the Southeast region. This result is the same for all four models. The suitability observed in this region may be a reflection of the lack of information about bloom events due to the low human concentration. Another explanation for the low occurrence in the North is that in these environments there is still a large proportion of the rivers preserved and not converted into reservoirs. Lentic environments are more amenable to flowering than lotic. Also, these species that were more reported are more successful in reservoirs (Komárek et al. 2014). Incomplete data of geographic distribution are common in biological datasets from tropical regions (Ballesteros-Mejia et al. 2013, Kamino et al. 2012, Soberón 2007; with the Amazon region being generally sub-sampled (Freitas et al. 2012). The distribution data are overlapped to regions with high human density (Letters & Jan 2013) and the spatial patterns that we observed reflect the activities of the Brazilian researchers in reservoirs that show the historical persistence of blooms (Lorenzi et al. 2018). Sampling bias is quite common for several biological groups and can have a strong effect on ENMs results (Kramer-Schadt et al. 2013).
Although the sampling bias demonstrated here is one of the main reasons that may explain the fragmentary distributional patterns we observed, less appreciated factors may also explain this pattern, such as: (1) material sampled extensively in a given area, causing an accumulation of many data to be processed (Hortal et al. 2015); (2) Financial and/or human insufficient resources for the identification and curation of species (Fontaine et al. 2012); and (3) social and logistical variables (e.g., accessibility, number of inhabitants of a region, economy) (Whittaker et al. 2005). Even more subjective factors, such as the researcher's preference for certain organisms or regions, may leave incomplete the distribution and occurrence scenarios for cyanobacteria (Ficetola et al. 2014).
Our results indicate that the available data on the geographic distribution for cyanobacterial blooms in freshwater environments are far from complete and have obvious geographical biases. However, it is much more comprehensive than the information available in the literature, as also reported in Freitas et al. (2012). We are aware that many of the cyanobacterial bloom events in water environments were not accessible, which may have reduced our ability to assess bloom events in Brazil. In our study, we mapped the available biological data about cyanobacteria responsible for the bloom events and the sample effort invested in the Brazilian territory.
Our results can provide useful information on current sampling gaps that need further research to improve distribution data on the occurrence of bloom events in public supply reservoirs and support the monitoring practices of these events.
Monitoring practices and risk assessments in water bodies include a proactive approach, encompassing inspection and monitoring programs with specific preventive actions (Huisman et al. 2018). However, although the enrichment of freshwater environments by nutrients is considered a major problem of pollution worldwide (Glibert et al. 2008), it is also one of the most important factors contributing to the increase in the number of bloom events (Glibert et al. 2008). In Brazil, eutrophication is still on the rise because of the increasing human population in many regions, which increases energy demands, increases the use of nitrogen fertilizers (N) and phosphorus (P) for agriculture, and increases the production of meat and animal waste. Nevertheless, we have noticed that the monitoring programs developed in Brazil are divided into four types of policies: prevention, restoration, improvement and no action (Caron et al. 2010). In other countries (e.g. the United States of America), advances are being made to detect bloom events and, in some cases, predict the occurrence and potentially reduce impacts. The rapid detection ability of phytosanitary cyanobacteria has progressed greatly from classical microscopic methods for detection involving specific molecules and genomes, which can be detected with a fluorescent signal (reviewed by Sellner et al., 2003). In addition, uses of remote images, packets and arrays that can detect and provide real-time information about species, as well as physical and chemical parameters have been enhanced (Stumpf & Tyler 1988, Lopes et al. 2016, Mishra & Mishra 2014. In Brazil, such advances and techniques are still far from being widely used, which may underestimating the actual bloom's occurrence in the country, affecting the data available on bloom events (Sellner et al. 2003).
Thus, we hope that this study will stimulate new cyanobacterial samplings and increase efforts to understand and predict algal blooms to reduce its occurrence or impacts in the future. The most effective way is to reduce the entry of nutrients into aquatic environments since blooms are a widespread problem affecting estuaries, coasts and freshwaters around the world with effects on ecosystems, human health, and economies.

Conclusions
Using a scientometric analysis and ENMs, we demonstrate that many of the bloom events of phytosanitary cyanobacteria reported in the Brazilian literature are of the toxic genera Microcystis, Raphidiopsis, and Dolichospermum. These genera are broadly distributed in Brazil and respond quickly to current environmental changes and should certainly occur in areas that were not currently detected in our scientometric and ENMs analyzes. Thus, we believe that there are still several sampling gaps to be filled to effectively unravel the geographic distribution of cyanobacterial that cause blooms in freshwater environments and, consequently, diminish the effect of the Wallacean shortfall in this group. For instance, the northern portion of Brazil, which still has low suitability for bloom events compared to the occurrences in other Brazilian regions with a large concentration of human centers and population, needs to be better sampled, especially, in large urban centers. The cyanobacteria overgrowth has been highlighted because of possible problems in aquatic ecosystems, and by ecological and sanitary interest.

Author Contributions
Ariane Guimarães: Substantial contribution to the design and design of the work; Contribution to data acquisition; Contribution in the analysis and interpretation of data.
Pablo Henrique da Silva: Substantial contribution to the design of the work; Contribution in the analysis and interpretation of data.
Fernanda Melo Carneiro: Substantial contribution to the design of the work; Contribution to data acquisition Daniel Paiva Silva: Substantial contribution to the design and design of the work; Contribution in the analysis and interpretation of data; Contribution in critical review adding intellectual content.

Conflicts of Interest
The authors declare that they have no conflict of interest related to the publication of this manuscript.

Ethics
Our study did not involve humans and / or clinical trials. The manuscript represents original and valid work and that neither this manuscript nor one with substantially similar content under the same authorship has been published or is being considered for publication elsewhere.

Data Availability
The data used were taken from the literature as mentioned in the materials and methods.