Floristic units and their predictors unveiled in part of the Atlantic Forest hotspot : implications for conservation planning

We submitted tree species occurrence and geoclimatic data from 59 sites in a river basin in the Atlantic Forest of southeastern Brazil to ordination, ANOVA, and cluster analyses with the goals of investigating the causes of phytogeographic patterns and determining whether the six recognized subregions represent distinct floristic units. We found that both climate and space were significantly (p ≤ 0.05) important in the explanation of phytogeographic patterns. Floristic variations follow thermal gradients linked to elevation in both coastal and inland subregions. A gradient of precipitation seasonality was found to be related to floristic variation up to 100 km inland from the ocean. The temperature of the warmest quarter and the precipitation during the coldest quarter were the main predictors. The subregions Sandy Coastal Plain, Coastal Lowland, Coastal Highland, and Central Depression were recognized as distinct floristic units. Significant differences were not found between the Inland Highland and the Espinhaço Range, indicating that these subregions should compose a single floristic unit encompassing all interior highlands. Because of their ecological peculiarities, the ferric outcrops within the Espinhaço Range may constitute a special unit. The floristic units proposed here will provide important information for wiser conservation planning in the Atlantic Forest hotspot.


INTRODUCTION
The patterns of geographic distribution of plant taxa are directed by a complex set of variables and interrelationships (Rizzini 1997).Among these, climatic variables deserve to be highlighted in mesoscale approaches because, in several studies, they are indicated as the main predictors of phytogeographic patterns (e.g., Engelbrecht et al. 2007, Oliveira-Filho et al. 2005, Scudeller et al. 2001).
Climatic variables, especially those related to temperature and precipitation, are important because they directly influence plant development FELIPE Z. SAITER et al. and are responsible for floristic changes along gradients (Grubb 1977, Pausas andAustin 2001).Precipitation gradients are quite complex and are influenced by factors such as surface roughness, distance to large water sources (oceans, inland seas and lakes) and air mass features (Wulf et al. 2010).The equally complex temperature gradients are linked to latitudinal changes in solar radiation (Breckle 2002, Kessler et al. 2011) and altitudinal changes in air pressure, humidity, and cloud cover (Körner 2007, Homeier et al. 2010).
Geographic location is also a relevant factor in floristic relationships among sites because floristic similarity is likely to increase with the spatial proximity (e.g., Scudeller et al. 2001, Oliveira-Filho et al. 2005, White and Hood 2004).Because of this, recent studies (e.g., Chain-Guadarrama et al. 2012, Gasper et al. 2013) have added spatial filters in statistical analyses to minimize the effect of spatial autocorrelation on the interpretation of the relationship among species distributions and environmental variables (Eisenlohr 2013, Legendre andLegendre 2012, Zimmermann et al. 2010).
In the Atlantic Forest of southeastern Brazil, a region with high diversity and endemism (França and Stehmann 2013, Rolim et al. 2006, Saiter et al. 2011), phytogeographic studies indicate that changes in tree composition are related to at least three major climatic gradients: a coastal-inland gradient of precipitation seasonality (Oliveira-Filho and Fontes 2000, Oliveira-Filho et al. 2005, Santos et al. 2011), a latitudinal gradient of temperature (Oliveira-Filho andFontes 2000, Oliveira-Filho et al. 2005), and an altitudinal gradient of temperature and humidity (Torres et al. 1997, Oliveira-Filho and Fontes 2000, Oliveira-Filho et al. 2005, Bertoncello et al. 2011, Kamino et al. 2008).Such phytogeographic patterns, however, are generalizations for a large and environmentally complex region (see Oliveira-Filho et al. 2005) and it is possible that they do not accurately explain the floristic variation in finer mesoscales.
Understanding subregional floristic patterns is still a challenge to elucidate the phytogeography of southeastern Brazil, taking into account that the Brazilian flora, as a whole, remains undercollected (Sobral and Stehmann 2009).Furthermore, policies for forest conservation, such as those related to the creation of parks and reserves, restoration of ecosystems, and sustainable use of natural products, can be more appropriately planned when biological-environmental changes throughout the region are better known (McShea et al. 2014).
Within southeastern Brazil, the Doce River Basin (DRB) is an interesting region for investigating subregional phytogeographic patterns because of its high number of forest inventories and botanical collections.The DRB also shows high environmental heterogeneity, with three coastal subregions (Sandy Coastal Plain, Coastal Lowland, and Coastal Highland) and three inland subregions (Central Depression, Interior Highland, and Espinhaço Range) recognized through geomorphology and climatic features (Cupolillo et al. 2008, Instituto Mineiro de Gestão das Águas 2010, Nascimento et al. 2012).
Our goals were to answer the following questions using floristic and geoclimatic data: [1] Which climatic variables better explain subregional phytogeographic patterns in the DRB?
[2] Does spatial proximity among sites influence these patterns?[3] Do the subregions of the DRB deserve to be treated as distinct floristic units?We addressed such questions considering, in particular, the implications for biological conservation in the Atlantic Forest hotspot.

study area
The Doce River Basin (DRB) encompasses approximately 87,000 km 2 in the eastern region of the state of Minas Gerais and the central and northern portions of the state of Espírito Santo in southeastern Brazil (Instituto Mineiro de Gestão das Águas 2010).The DRB is limited to the north by the Negra and Aimorés Mountains, to the west by the Espinhaço Range, to the southwest by the Mantiqueira Range, to the southeast by the Caparaó Range, and to the east by the Atlantic Ocean (Instituto Mineiro de Gestão das Águas 2010; Fig. 1).We added the small Barra Seca River Basin to the final stretch of the DRB in order to include Sandy Coastal Plain and Lowland sites of northern Espírito Santo into the dataset.
For this study, in agreement with the geoclimatic features (see Table I

PREPARING ThE DATASET
The dataset was composed of a binary matrix of occurrence records of tree species and a geoclimatic matrix of 59 sites within the DRB (Fig. 1 and Table I).Although five sites are located outside the boundaries of the DRB, they were included in the database due to their proximity to headwaters of tributaries (i.e., Santa Teresa, Santa Maria de Jetibá, Venda Nova do Imigrante, and Caparaó) or to the mouth of the Doce River (i.e., Regência).The matrices were extracted from the database NeoTropTree (see details at http://www.icb.ufmg.br/treeatlan/; Oliveira-Filho 2014).The binary matrix had 2,021 tree species and 16,835 occurrence records.The geoclimatic matrix was composed of the subregion code and 31 quantitative variables: three spatial variables (latitude, longitude and distance to ocean), plus raster data at 1-km resolution including one topographic (elevation) extracted from U.S. Geological Survey's HYDRO1k Elevation Derivative Database (http:// eros.usgs.gov/),and 27 climatic variables.Nineteen climatic variables were obtained from WorldClim 1.4 at approximately 1-km resolution (Hijmans et al. 2005), and three other variables -potential evapotranspiration, actual evapotranspiration, and an aridity index -from Zomer et al. (2007) based on WorldClim's data.The mean duration (in days) and severity of water deficit (amount in mm) were extracted from Walter's diagrams (Walter 1985).The mean frequency of frosts (in days), the percentage of cloud coverage, and the cloud interception (amount in mm) were obtained from Jones and Harris (2008).

Presets
We undertook a previous outliers analysis (McCune and Grace 2002; cut-off 2.0) and removed three sites in the SP subregion and one site located in 'canga' (a type of ferric outcrop) of the southernmost ER (Ouro Preto).We also excluded 510 singletons (species with only one occurrence data point) as they could not contribute to the most important ordination patterns (Lepš and Šmilauer 2003).After these procedures, the final matrix comprised 55 sites, 1518 species, and 15,959 occurrence records.

FLoristiC diFFerentiation oF suBregions
We used the Nonmetric Multidimensional Scaling (NMS) ordination techniques adopting the Sørensen's similarity coefficient to create dimensions representing the main gradients of species composition within the dataset.The NMS analysis was performed in the software PC-ORD 6 (McCune and Mefford 2011).
We evaluated the dissociation among five of six subregions through ANOVA with gradients scores (dimensions 1 and 2) that emerged from NMS, and then Tukey's post hoc test adapted for unequal samples (Smith 1971) when the F test was significant.The assumptions of normality of residuals and homoscedasticity were confirmed, respectively, by D'Agostino and Levene tests (Zar 2010).
The spatial structure of ANOVA was considered through the addition of MEM spatial filters (Moran's Eigenvector Maps; Dray et al. 2006), which were created by 'spacemakeR' package and selected progressively (Blanchet et al. 2008) by the package 'packfor' in R (R Core Team 2011).We created the MEMs from a matrix of Delaunay's triangular connectivity, including weighing 'min-max' to intensiveness of connection in the calculation of the matrix-product (Borcard et al. 2011, Kelejian andPrucha 2010).The fraction explained by the treatment (i.e., the five subregions) was partitioned from the fraction explained by selected MEMs with the aim of controlling the inflation of Type I error (Peres-Neto and Legendre 2010).In the post hoc tests, the selected MEMs were held as covariates.
As a solution for interpreting the identity of SP (in which all sites are excluded as outliers) and CL (a small sample size reduced statistical power, diminishing the chance of finding significant results) subregions, we performed a cluster analysis (Unweighted Pair Group Method -UPGMA) using Sørensen's similarity coefficient available in PC-ORD 6 (McCune and Mefford 2011).We obtained the cophenetic correlation coefficient and conducted the Mantel test (999 permutations) to check the consistency between cophenetic values and original similarity values.By doing so, we verified the reliability of groups (floristic units) that emerged from the dendrogram.
Complementary, indicator species were obtained by calculating the Phi coefficient (Tichý FELIPE Z. SAITER et al. and Chytrý 2006) in each floristic unit using the PC-ORD 6.0 software (McCune and Mefford 2011).The significance of Phi coefficient was tested using 999 Monte Carlo permutations.In order to obtain a set of the most representative indicator species, we selected only those with Phi coefficient ≥ 0.95 and/ or p ≤ 0.001.

CorreLations aMong FLoristiCs and geoCLiMatiC variaBLes
We undertook correlations a posteriori among NMS dimensions and geoclimatic variables using Pearson's correlation and linear regression models (OLS).Here we did not use latitude and longitude, since their influence on floristic gradients was checked through Moran's correlograms (see below).We pre-selected some variables that showed clear relationships with floristic patterns in each dimension (visual analysis), and then eliminated co-linearities, excluding redundant variables with lower explanatory power.We considered the existence of co-linearity when the variance inflation factor (VIF) of each variable was greater than 10 (e.g., Quinn and Keough 2002).Then, we selected the models adopting the best balance between parsimony and accuracy of data description (i.e., lower AICc -corrected Akaike Information Criteria -value; Burnham and Anderson 2002).We confirmed the models' assumptions considering the cautions indicated by Eisenlohr (2013).Specifically for normality of residuals, we used the D'Agostino-Pearson test (Zar 2010).Since we detected the absence of normality in models, we excluded outliers identified among studentized residuals (values > 2).
To evaluate gradients of species composition as a function of geographical distance, we verified the spatial structure of scores of each significant dimension of NMS through correlograms created by the software SAM 4.0 (Rangel et al. 2010) adopting Moran's I coefficient (Legendre and Fortin 1989) and following the default options.We tested the global significance of correlograms using Bonferroni's sequential correction, and confirmed the existence of spatial structure when at least one distance class was significant (Fortin and Dale 2005).
Since spatial structure in both response variables and predictors can inflate the Type I error (Peres-Neto and Legendre 2010), we also prepared correlograms for each individual variable, regardless of the residuals independence assumption (Landeiro and Magnusson 2011).Because we found spatial structure in all variables, we also prepared spatially explicit models to confirm the significances found.We also prepared partitioned models in the same manner as described above.Because the significance of these models was supported, we opted to show the results of the simplest models, i.e., without inclusion of MEMs.To achieve this goal, the occurrence data were Hellinger transformed (Legendre and Gallagher 2001) and the MEMs were forward selected.We then processed two Canonical Redundance Analyses (RDA), the first involving species and environment, and the second involving environment and space (MEMs).Note that the variance partitioning makes the removal of colinear variables unnecessary (Oksanen et al. 2013).We also tested the significance of pure fractions [a] and [c] by permutation-based ANOVA.

FLoristiC diFFerentiation oF suBregions
The NMS provided a two-dimensional solution (Fig. 2), with final stress value of 16.97.The dimensions 1 and 2 reproduced 56.3% and 24.6%, respectively, of the variation in relation to the similarity in the original space n-dimensional.We found significant differences among CL, CH and CD subregions (Table II).The differences, however, were not significant between IH and ER (with the Canga site excluded), indicating a floristic similarity of the whole set of inland sites located at higher elevations (Table II).The floristic identity of IH plus ER was confirmed by UPGMA (Fig. 2; cophenetic correlation coefficient = 0.90; Mantel bi-lateral test, p = 0.001), because sites of IH and ER were integrated in the same group (with the exception of the Conselheiro Pena and Serra do Ambrósio sites), in the middle portion of the dendrogram.The UPGMA also confirmed the floristic identity of the SP subregion and CL subregion through the emergence of discrete groups (floristic units).The Canga site of the southernmost ER constituted a floristic unit separated from other localities in the Espinhaço Range.Indicator species of six floristic units (i.e., SP, CL, CH, CD, IH plus ER, and Canga) emerged from the analyses and are listed in Table IV.Such indicator species are the most frequent and exclusive species in each floristic unit.

CorreLations aMong FLoristiC CoMPosition and geoCLiMatiC variaBLes
The most highly correlated variables (r 2 > 0.6) with the NMS dimensions are shown in Table III.The first dimension was effective in the segregation of sites along a thermal gradient, so that warmer sites were positioned to the right and colder sites to the left in the ordination diagram (Fig. 3).The variable most highly correlated with this dimension was mean temperature of the warmest quarter (r 2 = 0.803), which was strongly co-linear with elevation (r < -0.9), the aridity index (r < -0.9), and the following thermal variables (r > 0.9): mean temperature of the coldest quarter, mean temperature of the driest quarter, mean temperature of the wettest quarter, minimum temperature of the coldest month, and maximum temperature of the warmest month.Note that co-linearity is not critical here because we are dealing with an initial exploratory analysis instead of any inferential test.
The second NMS dimension summarizes the segregation of sites along a gradient of precipitation seasonality, with sites showing lower seasonality (1-3 months) occupying the upper portion of the diagram, and sites with higher seasonality (4-5 months) occupying the lower portion of the diagram (Fig. 2).The variable most highly correlated with this dimension was precipitation of the coldest quarter (r 2 = 0.718), which had strong co-linearity with precipitation of the driest month (r > 0.9), precipitation of the driest quarter (r > 0.9), and seasonality of precipitation (r < -0.9).The second dimension was only moderately correlated with longitude (r 2 = 0.53) and distance to ocean (r 2 = 0.51).
The best OLS models for each dimension were generated with only one signifi cant climatic variable.The model that explains the scores variation of NMS Dimension 1 (adjusted r² = 0.803; F = 216.181;p < 0.001) had temperature of the warmest quarter as a predictor (p < 0.001), while the model for scores variation of NMS Dimension 2 (adjusted r² = 0.718; F = 134.704;p < 0.001) had precipitation in the coldest quarter as a predictor (p < 0.001).

varianCe Partitioning
We summarized the results of variance partitioning in a Venn diagram shown in Figure 4.The fractions were signifi cant (p < 0.01) for both environment (climate + elevation) and space, although the fi rst explains a greater fraction of variance than the second.The fraction explained in the intersection of environment and space was also greater than the fraction of "pure" space.The unexplained fraction was high (80%).

DISCUSSION
Our results indicate that both precipitation and thermal variables were important for the distribution of tree species in the Doce River Basin (DRB).Precipitation variables related to seasonality were  I. Anticyclone, which encroaches on southeastern Brazil in winter and blocks humid air masses that cause frontal rain (Cupolillo et al. 2008).Due to this atmospheric blockage, the duration of the dry seasons tends to be uniform throughout the interior of the DRB (Cupolillo et al. 2008).
In fact, most inland sites in our dataset have a dry period of 4-5 months and precipitation in the driest quarter of less than 70 mm.These conditions induce deciduousness in 20-50% of trees, an important physiognomic feature used in the classifi cation of forests in these three subregions (Veloso et al. 1991).On the other hand, in coastal portions of the DRB (subregions SP, CL, and CH), humidity from the ocean can infl uence the winter rain patterns, promoting shorter and wetter dry periods (Cupolillo et al. 2008).The absence of pronounced droughts in the CH promotes the occurrence of tropical moist forests (Saiter et al. 2011), but in the CL some studies reported levels of deciduousness in the tree stratum that suggest the occurrence of semideciduous forests (Rolim et al. 2005(Rolim et al. , 2006)).
Although diminished, the ocean's influence seems to continue up to ca. 100 km from the coast, encompassing the CD sites in Espírito Santo (Colatina, São João de Petrópolis, Itaguaçu, Águia Branca, Alto Liberdade, Pancas, and São Gabriel da Palha).These sites have a shorter dry period (2-3 months) and a higher amount of precipitation in the driest quarter (ca. 100 mm) than other CD sites.There is a linear gradient of precipitation seasonality from the coast, west to the border of Espírito Santo and Minas Gerais.Farther west, the variables related to seasonal distribution of rainfall do not appear to vary with distance from the ocean.
In the ordination diagram (Fig. 3), the disruption of floristic gradients was also noted, because some CD sites in Espírito Santo were closer to CL sites than to other inland sites.Besides, the majority of inland sites were aligned along NMS Dimension 1, indicating that floristic variations among subregions were more correlated to temperature associated with elevation than with seasonal distribution of precipitation.We noticed that the strong correlation of NMS Dimension 1 with elevation and temperature variables suggests a thermal control in the species distribution following elevational gradients in both coastal and inland portions of the DRB.The ecological effect of elevation is related to its influence on geophysical factors that directly affect plant growth, such as air pressure, wind, humidity, insolation, and temperature (Grubb et al. 1977, Körner 2007).Adaptations of species to different levels of these factors drive the floristic composition along altitudinal gradients (Grubb 1977, Pausas and Austin 2001, Scarano 2002, Kessler et al. 2011).
At lower elevations (in general < 600 m), Euphorbiaceae, Leguminosae, and Sapotaceae are the richest and most abundant families, although some genera of Lauraceae (mainly Ocotea) and Myrtaceae (Eugenia, Marlierea, and Myrcia) are still represented by many species (Guedes-Bruni et al. 2006, França and Stehmann 2013, Lombardi and Gonçalves 2000, Rolim et al. 2006).In turn, indicator species of low-elevation units (SP, CL, and CD) mostly belong to these families and genera (see Table IV).
These characteristics support the significant differences found in NMS Dimension 1 between the CL and the CH, and between the CD and the IH plus ER.Considering this last case, we were unable to find significant differences between the IH and ER, despite some distinctive environmental conditions in our dataset, such as the fact that ER sites are located at higher elevations and in colder and rainier places than the IH sites.Physiognomic differences are remarkable between the IH, where forests predominate, and the ER, where shallow, sandy and dry soils, induce the dominance of savannas ('campos rupestres'; Giulietti andPirani 1988, Kamino et al. 2008).In the ER, forests occur in islands ('capões') or are connected to valley forests, where deeper and moister soils allow the development of trees 7-15 meters in height (Giulietti andPirani 1988, Kamino et al. 2008).
The absence of significant differences, however, indicates that forests in the ER share many tree species with the IH forests, i.e., those able to tolerate low temperatures and shallow, dry soils; these are probably the indicator species of the unit IH plus ER.The resemblance among the IH and ER forest sites agrees with Carmo and Jacobi (2013), Giulietti andPirani (1988), andKamino et al. (2008) regarding strong influences of Atlantic Forest Domain on the tree flora in forest patches and gallery forests of Espinhaço Range.
The SP was treated differently because the sites, due to their lower floristic richness when compared to other subregions, were considered outliers.The flora of the SP was derived from the migration of taxa from the mesic coastal forests (Rizzini 1997), but its floristic poverty may be explained by the insufficient time for speciation (Scarano 2002), since its origin is the Upper Quaternary after the last transgression occurred ca.5,100 yr BP (Martin et al. 1993).Despite this, the three sites of the SP formed a distinct group in cluster analysis, supporting the floristic identity of this subregion.
Another outlier case involved the Canga site of the southernmost ER that hosts forest patches on ironstone outcrops.This site comprises a group separated from other ER sites in the dendrogram (Fig. 2).Here, the richness of the tree stratum is lower than in other ER forests (Kamino et al. 2008), and can be explained by evolutionary selection for taxa that can tolerate low temperatures and substrates with a high percentage of toxic metals (metalophyte functional group; Carmo and Jacobi 2013).In fact, some indicator species of the Canga belong to metalophyte genera recognized by Carmo and Jacobi (2013), such as Eremanthus, Ilex, Trembleya, and Weinmannia.Notwithstanding this poverty in tree species, high levels of both richness and endemism have been recorded in the herbaceous and shrubby strata of Canga (Jacobi andCarmo 2008, Carmo andJacobi 2013).
The influence of spatial proximity in the patterns reported here cannot be disregarded.Although environmental factors have been very important in the explanation of floristic variation, the fraction explained purely by space was significant as well.We also assumed that the fraction explained by spatial processes was not overestimated because an extensive set of environmental predictors were also used.Therefore, the distribution of tree species along the DRB could be determined in part by their dispersal limitation.In addition, an important portion of the floristic variation can be attributed to environmental resemblances among sites that are geographically close (the spatially structured environment).
The high unexplained fraction (80%) of the floristic variation can be attributed to undetermined residuals related to variables that were not included in the analyses.In general, biotic interactions (including anthropogenic effects) and historical events are not included in biogeographical analyses due to the difficulty of quantifying them (Zimmermann et al. 2010); this exclusion is relatively common in vegetation studies (ter Braak 1987).
We can conclude, therefore, that: [1] Thermal variables (particularly temperature of the warmest quarter), associated with precipitation variables (particularly precipitation of the coldest quarter), were the most important factors for determining tree species distribution in the DRB; [2] Site spatial proximity significantly influenced the explanation of patterns due to the limited dispersal of species and spatially structured climatic variables; [3] Considering floristic and environmental features, the subregions SP, CL, CH and CD can be treated as distinct floristic units.The subregions IH and ER, however, should be recognized as a single floristic unit encompassing most of interior highlands.Furthermore, the 'cangas' of the southernmost ER should be considered as a distinct unit due to their unique ecological features as suggested by Jacobi and Carmo (2008).
Although general phytogeographic patterns can provide theoretical support for conservation planning in the Atlantic Forest at a coarse level, the stakeholders (mainly the government and conservation organizations) should always be aware of the ecological criteria determining discrete floristic units within a given region.Even if the floristic knowledge is insufficient, consideration of geomorphology and climate at finer scales in subregions defined for other purposes (e.g., geopolitics or water management) can lead to wiser biodiversity conservation planning.Parks and reserves should be created in each of these subregions, in order to protect most of the plant species, especially the endemic and indicator ones.In forest restoration, species selection should respect the tree composition in nearby forests that are floristically similar.

Figure 1 -
Figure 1 -Location of the Doce River Basin encompassing portions of the states of Minas Gerais and Espírito Santo in southeastern Brazil.Sites of the six subregions adopted in this study are highlighted.
varianCe PartitioningFollowing the protocol suggested by Eisenlohr (2014), we performed a variance partitioning of the explanation provided by: [a] only climatic variables + elevation, [b] the spatially structured fraction of these variables, [c] only spatial components, and [d] factors not measured.Here we used the packages 'vegan', 'spacemakeR', 'packfor' and 'spdep' in R  (R Core Team 2011).

Figure 2 -
Figure 2 -Cluster analysis (UPGMA) for 59 sites (including outliers) within the Doce River Basin, southeastern Brazil.Legend of symbols is available in Figure 1.The codes of sites are described in TableI.

Figure 3 -
Figure 3 -Diagram of ordination analysis (NMS) produced from fl oristic data of 55 sites within the Doce River Basin, southeastern Brazil.Symbol legend is available in Figure 1.The sites codes are shown in TableI.

2040
FELIPE Z. SAITER et al. more correlated to NMS Dimension 2 (Table III) and contributed to explaining the differences between two fl oristic sets: the coastal (CL and CH) and the inland subregions (CD, IH, and ER).Nevertheless, such variables were not strongly correlated to longitude or distance to ocean, as suggested by Oliveira-Filho and Fontes (2000) and Oliveira-Filho et al. (2005) for forests of southeastern Brazil, and by Santos et al. (2011) specifi cally for DRB.The absence of direct relationships among precipitation variables and longitude or distance to ocean in the DRB can be explained by the intense subsidence of dry air provided by the South Atlantic Subtropical

Figure 4 -
Figure 4 -Venn diagram with variance partitioning between environment (climate + elevation) and space in the Doce River Basin, southeastern Brazil.