How do distribution mapping methods perform in estimating beta diversity at macroecological scales? A case study with Neotropical anurans

: Species distribution mapping methods have their advantages and limitations concerning their use on theoretical and/or applied macroecological approaches. However, it remains underexplored how the estimates of community ecology metrics vary across the distributions generated by different mapping methods. Here, we mapped the distribution patterns of the anuran beta diversity in the Atlantic Forest and Cerrado hotspots generated by three mapping methods: point-to-grid (PTG), extent-of-occurrence (EOO), and ecological niche modelling (ENM) maps, so we were able to compare the congruence of the local contribution to beta diversity index (LCBD) among them, as well as their turnover and nestedness components. PTGs generated the most divergent LCBD values probably due to the more resolved spatial scale in which species’ presence are considered, so EEO and ENM generated similar beta diversity estimates for both hotspots. High LCBD values in the Cerrado were recorded in ecotone regions, whereas in the Atlantic Forest the highest beta diversity values were found along the Atlantic coast. The structure of beta diversity of PTG showed way too high values of importance for the turnover component compared to the EEO and ENM maps, which also recorded higher importance for the turnover than for the nestedness component.


INTRODUCTION
In macroecology, the occurrence records of a species are used to map its geographic occurrence, so the overlap of species occurrences is commonly used to calculate different biological diversity metrics, such as the gradient of species richness (e.g., Gaston & Blackburn 2000, Hulbert & White 2005). Among the different mapping methods available, richness patterns are mostly compared by three methods (Cantú-Salazar & Gaston 2013, García-Roselló et al. 2014, Graham & Hijmans 2006, Hulbert & White 2005, Vasconcelos et al. 2012): a) extent-of-occurrence range maps (EOO), which presume that the distributional area is made up of connected populations. Therefore, EOOs are typically minimally convex polygons encompassing all known point occurrences, which in turn represent the species' extent of occurrence; b) point-to-grid maps (PTG), which consider the species distribution as the occurrence records within a pre-defined grid system that the species is found, and; c) ecological niche model or species distribution model maps (ENM), which is generally built up considering the climatic niche of a species, obtained from the climatic characteristics within its known occurrence records. The climatic niche, which is expressed as the multivariate space of climatic variables best matching the observed species' distribution, is projected onto a landscape of interest, thus assuming that the species distribution is considerably determined by aspects of climate (Araújo & Peterson 2012).
Studies comparing the richness patterns generated by different distribution approaches generally point out higher values of richness estimates when the species distribution is generated by EOO and, even higher, by ENM maps (García-Roselló et al. 2014, Graham & Hijmans 2006, Hawkins et al. 2008, Vasconcelos et al. 2012. These overestimations are usually related to the consideration of species occurrence in areas between the occurrence records (Graham & Hijmans 2006). Conversely, PTG maps are prone to generate decreased values of richness estimates due to the underestimations of species occurrences caused by the insufficient and spatially biased sampling records, as well as the more resolved spatial scale in which species' presence are considered (e.g., Graham & Hijmans 2006, Hawkins et al. 2008, Hulbert & White 2005. Therefore, the choosing of a giving mapping method should consider the inherent advantages and disadvantages for their specific application and/or data availability. Other biological diversity metrics may be explored and mapped by ecologists to quantify different components of biodiversity that can constitute unique biological information to guide effective conservation actions. For instance, the beta diversity is the variation in species composition among sampling units (e.g., Legendre 2014). Therefore, quantifying the beta diversity index for a set of communities (i.e., sampling units) may indicate places having unusual species combinations of high conservation value, or biologically degraded sites that may be good candidates for ecological restoration (Legendre 2014, Socolar et al. 2016. Moreover, the communities may vary in species composition due to two processes (e.g., Baselga 2010, Legendre 2014): a) the species replacement (also called turnover), so a complete differentiation is due to the presence of different species among the communities, and; b) the richness difference or nestedness pattern of species distribution, so the variation in species composition is due to sites having less species that, in case of a typical nested pattern, are a strict subset of the species at a richer site. Therefore, beta diversity influenced by one or another structuring component may indicate different ecological or evolutionary processes, which in turn requires different strategies for an efficient biological conservation (Baselga 2010, Legendre 2014).
Since the quantification of beta diversity requires a species distribution matrix, one may suppose that different mapping methods generate different estimates of beta diversity for a given study region. Da Silva et al. (2016) compared the performances of different distribution methods on the estimates of beta diversity and found that, despite the richness estimates were reasonably similar between range maps and checklists (i.e., a point occurrence approach), this pattern was less congruent for beta diversity. Despite the authors point out the inherent characteristics of each distribution method (i.e., checklists are associated with local processes while range maps are associated with regional processes; Hortal 2008), it remains unclear why such data sources generated similar estimates for species richness, but not for beta diversity. Moreover, the use of ENM maps to generate biological diversity metrics are largely underexplored for beta diversity.
Here, we make use of anurans in two biodiversity hotspots in South America that present intermediate to highest levels of different biological diversity metrics in South America : the Atlantic Forest and Cerrado (Mittermeier et al. 2004).
We aim to perform different beta diversity approaches that are appropriated for estimating and mapping such a metric at macroecological scales (Legendre & De Cáceres 2013). Therefore, we compare the estimates of beta diversity among the anuran compositions generated by EOO, PTG, and ENM methods, separately for each hotspot, and also decompose the beta diversity index to quantify and compare the contributions of replacement and nestedness components among the different mapping methods. Our main goal is to identify congruencies among the different methods that will further support future studies in the use of beta diversity at macroecological scales. Therefore, though we hypothesize that the methods EOO and ENM (inherently more coarse-grained; García-Roselló et al. 2014, Hawkins et al. 2008) will generate more congruent beta diversity patterns between them than the beta diversity generated by PTG maps (more fine-grained; Hulbert & White 2005), we intend to quantify how congruent the beta diversity patterns are among the different methods.

Study area and species data
The Atlantic Forest hotspot (AF) ranges from the northeast to southern Brazil and is made up of different forested biomes (e.g., evergreen forest along the Atlantic coast and semideciduous forest in inland areas) (Mittermeier et al. respectively. In summary, this compilation was performed from open-access digital databases (the Global Biodiversity Facility: www.gbif.org; the SpeciesLink project: http://splink.cria.org.br), surveys at museums and scientific collections with representative anurans from the AF and CER, and literature records. Detailed procedures regarding data cleaning and filtering are found in Vasconcelos et al. (2018) and . At the end, this survey resulted in 512 anuran species with 18,799 occurrence records for the AF and 197 anuran species with 16,378 records for the CER (point records are available in the supplementary material of Vasconcelos et al. 2018). Here, we used a different number of species (see next sections) due to limitations of occurrence records to properly perform ENMs, additions of newly described species, and/or taxonomic changes that occurred after Vasconcelos et al. (2018).

Species database of mapping methods
We obtained the EOO range maps of most species from the International Union for Conservation of Nature portal (IUCN 2017). We then added the species absent in the IUCN database and updated the nomenclature according to Frost (2018). For the species lacking EOO maps, we generated them using the occurrence records. For those species having three or more records, we generated the EOO maps using the function "minimum bounding geometry" in ArcGIS 10.1 that connects the shortest distance between any two vertices of the convex hull (e.g., García-Roselló et al. 2014. For the species having up to two records, their ranges were considered as the area within ~50 km of diameter of each occurrence record (e.g., .
The PTG species maps were generated by overlaying the species occurrences onto a grid system at a resolution of 0.5º of latitude and longitude, so the geographic distribution of each species was represented by their occurrences at these grid cells (García-Roselló et al. 2014, Graham & Hijmans 2006, Hawkins et al. 2008. The ENM maps were generated in a previous study, so a fully description of model building is found in Vasconcelos et al. (2018). In summary, ENMs were generated for 347 anurans in the AF and 153 species for the CER due to the exclusion of species having less than five different occurrence records, which is a limitation for a robust ecological niche modeling (see Vasconcelos et al. 2018, and references therein). The ENM map of each species was generated by an ensemble approach considering four algorithms (generalized linear models, boosted regression trees, random forests, and support vector machines) (sensu Araújo & New 2007) as a function of nine non-collinear climatic variables (Naimi & Araújo 2016) available from the Worldclim portal (http://www.worldclim. org/version1).

Beta diversity and statistical analyses
Due to the limitations in the ENM methodology, we were forced to exclude those species with less than five occurrence records, so the number of ENM species maps are reduced compared to the total number of EOO and PTG maps. All else being equal, this difference in the species number may generate different beta diversity patterns among the species distribution matrix of the different mapping methods. Therefore, we are considering two datasets to analyze the beta diversity patterns: a) a complete dataset, in which we consider all species maps obtained for the respective mapping method, and; b) a matched-dataset, in which we consider only those species having their maps obtained for the three mapping methods (see Supplementary Material - Table SI and SII).
The species maps of each mapping method were overlaid onto a grid system with 0.5º of latitude and longitude. Then, we were able to generate presence/absence matrices for each mapping method and hotspot, as well as for the complete and matched-dataset. Each presence/absence matrix was submitted to the Jaccard dissimilarity coefficient and was used to calculate the local contribution to beta diversity index (LCBD; Legendre & De Cáceres 2013) of each grid cell, thus representing comparative indicators of the uniqueness of each grid cell in terms of the species composition.
The structure of the anuran beta diversity was explored by decomposing the Jaccard dissimilarity coefficient into replacement and nestedness components. Here, we considered the replacement and nestedness concepts of the "Baselga-family" (sensu Legendre 2014), so the type of richness difference pattern considered here is characterized by the species at a site being a strict subset of the species at a richer site (Baselga 2010, Legendre 2014). To do so, we generated triangular plots representing the nestedness, replacement, and similarity The LCBD grid values of the different mapping methods were compared by the percentage difference distance (Legendre & Legendre 2012). In this case, we considered the LCBD values of each mapping method in the rows and the grid identifications in the columns, then we were able to quantify the congruency among the spatial distribution patterns of the anuran beta diversity in the AF and CER generated by the different mapping methods (e.g., Runge et al. 2016.

Common patterns for the AF and CER
For both hotspots and datasets analyzed, we found that the beta diversity patterns generated by the mapping methods EOO and ENM were spatially more congruent among themselves than for the PTG. The beta diversity congruencies between EOO and ENM maps were higher than 90% (Table I and II). The beta diversity congruency between PTG maps and the other mapping methods was higher for the AF anurans than for the CER (Table I and II).

Atlantic Forest
The congruency of PTG and the other mapping methods varied around 75-79% for both datasets analyzed (Table I). This reasonable spatial congruence shows that the highest LCBD values are found in the northeast and southeastern Brazil, mainly associated to the AF coast region (Figure 1; results concerning the complete dataset are shown in the Supplementary Material, then see also Figure S1). Despite the similarity of 75-79% between PTG-EOO and PTG-ENM, the beta diversity decompositions into species replacement and nestedness of PTG are quite divergent compared to the patterns observed for EOO and ENM (Figure 2 and S2). For the matched-dataset, the beta diversity decomposition for PTG shows a very low similarity (0.061), yet the similarity values for EOO and ENM are more similar to each other (0.355 and 0.292) (Figure 2). Though the replacement component is the prevalent one structuring the beta diversity in the AF, its value for the PTG is far higher (0.836) than the values obtained for EEO (0.476) and ENM (0.516) (Figure 2). The results concerning the complete dataset of the beta diversity decomposition of AF anurans are given in Figure S2 and are similar to the matched-dataset.

Cerrado
The congruency of PTG and the other mapping methods was lower than in the AF and varied around 41-44% (Table II), thus indicating that the patterns generated by PTG maps were discrepant compared to the other methods in the CER ( Figure  1, Figure S1). This discrepancy is also evidenced by the beta diversity decomposition analysis; for the matched-dataset, the component species replacement has a higher importance value for PTG maps (0.858) than those ones found for EOO (0.387) and ENM (0.52) (Figure 2). Again, similar results of beta diversity decomposition of CER anurans were found for the complete dataset ( Figure S2). The high congruency found between the EOO and ENM LCBD values in the CER shows that high beta diversity areas are mainly located around the CER rim, in the transition regions to other major biomes (Figure 1 and S2).

DISCUSSION
We found that estimating beta diversity gradients by considering the point occurrence records into a grid system generates quite divergent results compared to other mapping approaches.
Though the general estimates of beta diversity were reasonably congruent between PTG-EOO and PTG-ENM in the AF (congruency of 74-79%), the replacement and nestedness were quite divergent between the same mapping methods. Despite the increase in the grain size generally increases beta diversity estimates (e.g., Ochoa-Ochoa et al. 2014), larger grids would probably equalize the beta diversity estimates among the mapping methods (Hawkins et al. 2008) by considering species occurrence at larger areas of the larger grid cells. Therefore, the divergent beta diversity estimates of PTG maps are mostly attributed to the more resolved spatial scale in which the species distribution is considered; species presence is always considered by in situ individual records at the respective grid cell for PTG maps whereas species presence is sometimes considered by their presumed occurrence in distributional gaps between occurrence records (EOO maps) or places having similar climatic niches compared to that estimated considering the occurrence records (ENM maps). Then, this fact makes PTG maps more prone to "omission errors" of the species distribution estimates (García-Roselló et al. 2014, Graham & Hijmans 2006, Hawkins et al. 2008, Pineda & Lobo 2009, and is reflected here in some grids lacking LCBD values (see detailed discussion on this issue ahead). Another important result is that, even when we were forced to exclude species from the analyses due to limitations in the ENM methodology, the beta diversity estimates generated by EOO and ENM were more congruent between them than to PTG. In this case, the excluded species are microendemics with less than five occurrence records (e.g., Vasconcelos et al. 2012). At broadscales, the exclusion of such species does not substantially affect neither the richness gradients (e.g., Jetz & Rahbek 2002, Vasconcelos et al. 2012 nor the spatial patterns of beta diversity (present study), which also seems to be true to the beta diversity structure because the different datasets generated similar estimates of the species replacement and nestedness components between the methods EOO and ENM (Figure 2 and S2). The estimates of beta diversity in the CER presented even higher divergence of PTG compared to the other two mapping methods. As mentioned for the AF, this is resultant from the higher rates of omission errors generated by the PTG maps, which seems to be amplified in the CER where the occurrence gaps are more pronounced (Vasconcelos & Nascimento 2014). The dark-gray areas of PTG maps ( Figure  1 and S1) represent areas without occurrence records in which the LCBD values could not be estimated. An exploratory point-density analysis on the complete point occurrence dataset (see similar approach in Vasconcelos & Nascimento 2014) evidences that such darkgray areas of PTG maps are also areas with lower representativeness of anuran occurrences (i.e., distribution gaps; Figure S3). The inland areas of the AF concentrate most of these distribution gaps ( Figure S3) and do not seem to generally affect the anuran composition across the AF because the beta diversity estimates of the three mapping methods broadly point out these areas as having a homogeneous species composition (i.e., lower LCBD values). On the other hand, the distribution gaps are more widespread across the CER (Figure S3), so we can find dark-gray cells of PTG maps depicting either lower (e.g., the central-western CER region) or higher LCBD values (e.g., the northeastern and northwestern CER regions) for the EOO and ENM maps ( Figure  1 and S1). Vasconcelos & Nascimento (2014) found that the Brazilian CER has more gaps in the anuran representativeness than in the AF when the occurrence records are surveyed in the online databases GBIF and SpeciesLink. Despite we also considered additional sources of occurrence records (e.g., survey at herpetological collections and the scientific literature), a sufficient covering on the spatial occurrence records is still lacking, mainly for the CER ( Figure S3). Then, when it comes to generate beta diversity estimates from ground-truthed data at finer resolutions, the PTG approach will likely result in more divergent gradient of beta diversity than other coarse-resolution mapping maps (da Silva et al. 2016, present study).
For both hotspots and datasets analyzed, the beta diversity estimates generated by EOO and ENM distribution maps were more similar between them. However, the LCBD values presented smoothed and coarse-grained gradients using the EOO maps, whereas the ENM maps generated a more complex and patchy gradients of beta diversity. The main cause explaining these issues is related to how these maps are generated; EOO maps are generally continuous distributional representations of the connection of the outer point records, whereas the ENM maps are built upon a complex relationship of the environmental variables and the known occurrence records, so the ENM distribution maps might be less uniform and patchier than EOOs (e.g., Vasconcelos et al. 2012).
The coastal region of the AF supports the highest levels of beta diversity as it is shown by the congruent results among the mapping methods and datasets. However, this fact was not so pronounced in the specific southeastern region of the AF for the ENM maps, reinforcing that the absence of many small-ranged species (i.e., with less than five occurrence records) for the ENM dataset decreased the LCBD values in the southeast coast while the same region found high beta diversity estimates for the other two mapping maps. Finally, the lower LCBD values found toward inland areas of the AF for all mapping methods is also related to the presence of more widely distributed species (dos Santos et al. 2009. Higher LCBD values in the CER that are congruent between the methods EOO and ENM are found around the rim of CER. This is interesting because these high beta diversity areas in the CER seems to be highly influenced by the surrounding major biomes, such as the Caatinga xeric environment in the northeast, the Amazonian (in the north) and the AF (in the southeast) forested environment, and the Pantanal chaco area in the southwest. Indeed, Valdujo et al. (2013) found that the beta diversity patterns of endemic CER species, which is approximately 51% of all species in the CER (Valdujo et al. 2012), are related to the environmental conditions of the adjacent biomes. Since most amphibians depend on a humid environment to maintain their physiological needs (e.g., cutaneous gas exchange; Duellman & Trueb 1994), and that the CER has a predominant drier open vegetation formation, it is reasonable to think that the core CER area is made up of a more homogeneous anuran composition, whereas anurans across the ecotone regions are made up of different species pools, which in turn have generated higher variations on species composition in the present study.
In summary, for broad spatial scales, the species mapping methods EOO and ENM generated beta diversity estimates more similar between one another than to the PTG. Apart from the spatial differences between the LCBD values of EOO and ENM, the decompositions of beta diversity into components of species replacement and nestedness are still more similar between them than the divergent results found for PTG. All in all, we do not mean to discourage the usefulness of PTG maps in biogeography. The divergent beta diversity estimates found for PTG maps is a combination of how the maps are conceptually built and the prevalence of knowledge gaps in the occurrence records for a considerable number of species. Filling the geographic gaps to increase the spatial coverage of species distributions may overcome the divergent results we found for PTG maps. Nonetheless, overcoming inadequacies in distributional data represent a mid to longterm challenge that ecologists might still face in the upcoming years, since it depends on continuous and massive incentives from politics and decision-makers to value in situ studies in underexplored areas, especially in the Neotropical region. When we do not have this completeness on occurrence records, ENMs may be a good mapping method to generate diversity gradients (e.g., Vasconcelos et al. 2012, García-Roselló et al. 2014, Ochoa-Ochoa et al. 2014. Therefore, coupled to the multiple biodiversity threats that have been intensified along the last century (Millenium Ecosystem Assessment 2005), we do not have enough time to wait for perfect methods and complete datasets. Regarding theoretical macroecological studies or conservation biogeographic approaches, the mapping methods EOO and/or ENM are recommended over the use of PTG because the first two methods capture similar levels of beta diversity gradients and their decomposition into different structuring components.