Phylogeography of the banded butterflyfish, Chaetodon striatus, indicates high connectivity between biogeographic provinces and ecosystems in the western Atlantic

Among the four butterflyfishes of the genus Chaetodon present in the western Atlantic, the banded butterflyfish Chaetodon striatus has the largest distribution range, spanning 44 degrees of latitude (from Massachusetts, USA to Santa Catarina, Brazil). Although the ecology of the banded butterflyfish has been well studied over its entire range, nothing is known about its phylogeography and how biogeographic barriers structure its populations. To assess the level of genetic connectivity among populations from distinct biogeographic provinces and environmental conditions, we collected samples from seven localities: Puerto Rico, in the Caribbean, and Tamandaré, Salvador, Abrolhos, Trindade Island, Arraial do Cabo and Florianópolis, in Brazil. One nuclear (rag 2) and two mitochondrial (control region and cyt b) molecular markers were sequenced. Our findings are consistent with a recent population expansion, around 30–120 thousand years ago, which was found for all populations. Haplotype network analyses point to the Caribbean as a refugium before the population expansion. Results show no geographic pattern of genetic diversity. Indeed, a lack of population structure was found and no isolation was observed across oceanographic barriers, as well as between coral and rocky reef ecosystems. Furthermore, no directionality in the migration pattern was found among populations. Since ecological and environmental characteristics are very diverse across such a vast geographic range, the lack of genetic differentiation suggests that C. striatus evolved ecological plasticity rather than local adaptation in the western Atlantic.


INTRODUCTION
The number of studies on genetic connectivity of marine organisms has increased in the last decades (e.g., Jones et al., 2007;Eble et al., 2015), mainly due to their importance for understanding biogeographic patterns (Rocha et al., 2008) and for management and conservation (Palumbi, 2003(Palumbi, , 2004. Many organisms associated with reef habitats are relatively sedentary as adults, dispersing primarily during the larval phase (Leis, 1991). Dispersal potential can be related to spawning strategy (e.g., pelagic vs. demersal), pelagic larval duration (PLD), and rafting ability (Luiz et al., 2013). However, geographic barriers and ecological constraints can limit dispersal, leading to self-sustained and genetically structured populations, even in species with extensive PLDs (Planes, 1998;Almany et al., 2007;Jones et al., 2007;Selkoe, Toonen, 2011).
Several reef fishes are widely distributed along the western Atlantic, but for most of them it is unclear whether there is contemporary gene flow among biogeographic provinces or if barriers are restricting it. For example, the Amazon plume is a wellknown barrier isolating Caribbean and Brazilian Provinces (Rocha, 2003;Floeter et al., 2008). The remoteness of oceanic islands also isolates many populations in the southwestern Atlantic (Neves et al., 2016;Pinheiro et al., 2017;Dias et al., 2019). Water temperature and reef formation (e.g., coral and rocky reefs) are also known as ecological drivers acting synergistically to separate populations from northern and southern waters of the southwestern Atlantic (Santos et al., 2006;Van Tassell et al., 2015;Pinheiro et al., 2018). Thus, depending on the strength of such barriers, peripheral populations may exhibit lower genetic diversity as a consequence of smaller effective population sizes and isolation (Eckert et al., 2008).
Butterflyfishes (Chaetodontidae) have a circumglobal distribution, being present in all tropical and subtropical oceans and are among the most conspicuous fishes on reefs, comprising around 134 species. The vast majority of species (around 90%) inhabits the Indo-Pacific (Nelson, 2006;Pratchett et al., 2014). In the western Atlantic Ocean there are only nine species, mostly found in the Caribbean. The banded butterflyfish Chaetodon striatus Linnaeus, 1758 has a wide distribution, with vagrant individuals found as far north as Massachusetts, and established populations from the coral reefs of the Greater Caribbean southwards to the rocky reefs of the southern Brazilian coast (27ºS) (Carvalho-Filho, 1999;Floeter et al., 2008;Anderson et al., 2015;Pinheiro et al., 2018). Studies that focused on feeding habits and gut contents showed that this species is considered a noncoral generalist feeder, demonstrating plasticity in diet and no selectivity pattern across its distribution range (Liedke et al., 2016(Liedke et al., , 2018. Like for other butterflyfishes, the pelagic larval duration (PLD) of C. striatus lasts between 32-52 days (Booth, Parkinson, 2011;Leis, Yerman, 2012). Despite long PLDs, Indo-Pacific butterflyfish populations have been shown to present a range of low to high genetic structure (McMillan et al., 1999;Craig et al., 2010;Lawton et al., 2011;DiBattista et al., 2012;Messmer et al., 2012). In the western Atlantic, no previous studies have investigated the influence of biogeographic barriers and ecological drivers on the structure of butterflyfish populations.
To investigate the phylogeography of the banded butterflyfish C. striatus, we used mitochondrial and nuclear molecular markers to evaluate the following questions: 1) Is genetic diversity different among populations? We predict that the Caribbean population should have higher genetic diversity than peripheral populations, since this region is postulated as being the center of origin and accumulation of species in the Atlantic (Rocha et al., 2008).
2) How did population size in different populations change through time? Some populations in the western Atlantic show a recent increase in population size, which may be related to sea level fluctuations during the Pleistocene (Silva et al., 2015).
3) Do the western Atlantic biogeographic barriers and different ecosystems influence the genetic structure of the studied populations? Here we expect a strong genetic signature between Caribbean and Brazilian populations, since the Orinoco/Amazon barrier has been suggested as a driver of speciation and vicariant agent for several coastal taxa (Rocha, 2003;Rocha et al., 2008;Floeter et al., 2008;Cunha et al., 2014). The San Francisco River has also been pointed as a potential barrier diverging sub-provinces along the Brazilian coast (Pinheiro et al., 2018). We also expect genetic structure between oceanic islands and coastal populations, since the islands isolation has also driven the differentiation of many insular species (Neves et al., 2016;Pinheiro et al., 2017;Dias et al., 2019). Moreover, as the water temperature and type of ecosystem influence ecological speciation in the western Atlantic (Rocha et al., 2005;Pinheiro et al., 2018), we may expect some genetic structure between coral and rocky reef populations as well.

4/17
Phylogeography of Chaetodon striatus 4) Is there a pattern of migration between populations? For example, Brazil, as a peripheral province, may be expected to be a sink rather than a source of migrants. The alternate hypothesis is that northward movement is favored across the Orinoco/Amazon barrier by the North Brazilian Current (Luiz et al., 2013).

MATERIALS AND METHODS
Sampling and DNA extraction. A total of 135 specimens of C. striatus were sampled between January 2010 and February 2011, in seven localities, north (one population) and south (six populations) of the Orinoco/Amazon barrier ( Muscle tissue was immediately stored in 95% ethanol at ambient temperature in the field, and later transferred to a freezer at -20 o C in the laboratory. Tissues were digested overnight at 55°C in 650µL of extraction buffer (400 mM NaCl, 10 mM Tris, 2 mM EDTA, 1% SDS) and the DNAs were purified by standard chloroform extraction and isopropanol precipitation (Sambrook, Russell, 2001).
Amplification of the 5' hypervariable portion of the mitochondrial control region was accomplished with nested PCR´s using first the primers CB3L (5'GGC AAA TAG GAA RTA TCA TTC 3') and CR-E (5'CCT GAA GTA GGA ACC AGA TG -Lee et al., 1995) followed by ProL (5'CTA CCT CCA ACT CCC AAA GC 3') and CR-E. Thermal cycling in polymerase chain reactions (PCR) consisted of an initial denaturation step at 94°C for 45 sec, then 35 cycles of amplification (45 sec of denaturation at 94°C, 30 sec of annealing at 52°C with the first pair of primers and 54°C with the second, and 1 min of extension at 72°C) and a final extension of 5 min of 72°C. In addition, we amplified and sequenced segments of the mitochondrial cytochrome b (cyt b) using the primers GLUDG-L (5'TGA CTT GAA RAA CCA YCG TTG 3') and CB3H (5'GGC AAA TAG GAA RTA TCA TTC 3') with the following protocol: an initial denaturation step at 94°C for 45 sec, followed by 35 cycles of 45 sec at 94°C, 45 sec at 54°C, and 1 min at 72°C and a final extension of 5 min of 72°C. The nuclear protein coding recombinationactivating gene 2 (rag 2) was amplified with the following primers rag 2 1F (5'GAG GGC CAT CTC CTT CTC CAA 3') and rag 2 9R (5'GAT GGC CTT CCC TCT GTG GGT AC 3'). The thermal cycling was performed by an initial denaturation step at 94°C for 45 sec, followed by 35 cycles of 30 sec at 94°C, 1 min at 60°C, and 2 min at 72°C with a final extension of 5 min of 72°C. Each 13ìL reaction contained 5-50 ng of DNA, 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl 2 , 1.25 U of Taq DNA polymerase (PerkinElmer -San Jose, CA), 150 mM of each dNTP, and 0.3 mM of each primer. After purification following the manufacturer's protocol (Applied Biosystems -Foster City, CA), direct sequencing was performed with an ABI 3100 automated sequencer (Applied Biosystems) at University of California Berkeley. Sequencing was performed in one direction only for all genes. The identity of the sequences was confirmed by comparing them with the National Center for Biotechnology Information (NCBI) BLASTn tool. We used the computer program Geneious 5.0 (Biomatters) to align the sequences. Our alignments resulted in three matrices of 477 bp, 789 bp and 749 bp for mitochondrial control region (122 sequences), cyt b (125 sequences) and rag 2 (105 diploid sequences, respectively). All analyses were performed with four datasets, each gene individually (control region, cyt b and rag 2) and with both mtDNA markers concatenated (which resulted in a 1266 bp fragment). Since we only obtained three individuals, from Trindade Island, they were only included in the haplotype network analyses.
Genetic diversity and evolutionary history. Genetic summary statistics as nucleotide and haplotype diversity, Tajima's D (Tajima, 1983) and Fu's Fs (Fu, 1997) neutrality tests were estimated for each population, as well as for all individuals in the program Arlequin version 3.5 (Excoffier, Lischer, 2010) and DnaSP (Rozas et al., 2003). Estimates of È (= 2 Íì, where ì is the mutation rate) were calculated for each population as well as for C. striatus as a whole. We used Fluctuate v. 1.4 (Kuhner et al., 1998) to estimate the parameters È and g (following procedures described in Bernardi, Lape, 2005 andDomingues et al., 2005). Analyses were repeated 10 times per region. Coalescence time was estimated by assuming that coalescence was reached when the population size was 6/17 Phylogeography of Chaetodon striatus reduced to 1% of its present-day value. In order to estimate coalescence time, we used the mutation rate (ì) as ì = 8.24-9.3 for control region (Domingues et al., 2005) and ì = 1.5-1.7 for cyt b (Bernardi, Lape, 2005).
Neutrality analyses were performed with Tajimas's D (Tajima, 1989) and Fu´s Fs (Fu, Li, 1993) for all datasets. In addition, the datasets were assessed for the most appropriate model of nucleotide substitution using the Akaike Information Criterion (AIC) as implemented in Modeltest 3.06 (Posada, Crandall, 1998). In addition, in order to analyse population size dynamics though time for both mtDNA loci combined, we used a Bayesian Skyline Plot method implemented in BEAST 2.0 (Drummond et al., 2005). This Bayesian approach incorporates the uncertainty in the genealogy by using MCMC integration under a coalescent model, where the timing of divergence dates provides information about effective population sizes through time. We used the evolutionary model suggested by Modeltest, a length chain of 10 6 and the substitution rates described above. The mutation rate used for control region was 6.94-7.83% and cyt b 2.36-2.67% per million years (Bernardi, Lape, 2005;Domingues et al., 2005).

Connectivity and structure between provinces and ecosystems.
Genealogical relationships among haplotypes were estimated with a median-joining haplotype network using "Pegas" R package (Paradis, 2010). Population structure and gene flow were assessed with an analysis of molecular variance AMOVA (Excoffier et al., 1992) in the program Arlequin version 3.5. For this analysis, alternative scenarios of geographic/ ecosystem subdivision were used: 1) Caribbean (PR) vs. Brazilian Province (TAM, SAL, AB, AC, FLO) -for this, we ran ten AMOVAs with equal sample size, randomly subsampling Brazilian populations each time; 2) São Francisco River influence as an oceanographic barrier (PR, TAM vs. SAL, AB, AC, FLO); 3) coral reefs (PR, TAM, SAL and AB) vs. rocky reefs (AC and FLO). Population pairwise Fst comparisons were calculated using the program Arlequin version 3.5, where statistical significance was obtained using 1,000 replicates. Migration between populations was estimated using Migrate version 2.0 (Domingues et al., 2005) and analyses were repeated 10 times to ensure stability of parameter estimates.

RESULTS
Genetic diversity and evolutionary history. The concatenated mtDNA alignment of Chaetodon striatus resulted in 1266 bp, with 65 phylogenetically informative positions and 110 haplotypes, low nucleotide diversity (0.006 ± 0.003), but high haplotype diversity (0.99 ± 0.001). Sequences generated for the nuclear rag 2 had 10 phylogenetically informative nucleotides and 42 haplotypes (Tab. 1). Low nucleotide diversity (0.002 ± 0.001) and high haplotype diversity (0.76 ± 0.02) were also found among individuals. Although nucleotide diversity was general low, Brazilian populations showed the highest values for the mitochondrial markers, while Puerto Rico, in the Caribbean, showed slightly higher values for the nuclear marker (Tab. 1). Yet, Puerto Rican population showed the highest number of haplotypes for both mitochondrial and nuclear genes, and the highest haplotype diversity for the nuclear gene (Tab. 1).
Population growth and coalescence times were estimated for C. striatus (Tab. 2).  Coalescence time of the entire population was estimated between 32 and 86 Kya. Individual populations show a range from 28 and 119 Kya (Tab. 2). The Bayesian skyline plot (HKY + G model for both segments) and population growth values were consistent in showing a significant population size increase around 80-100 Kya (Fig. 2).

Connectivity and structure between provinces and ecosystems.
Values of pairwise Fst among populations were generally low for all datasets and only the comparison between Puerto Rico and Abrolhos populations for the nuclear gene was statistically significant (Tab. 3). These results were reflected by haplotype networks not showing any obvious pattern of population structure (Fig. 3), including for the isolated population of Trindade Island. The mtDNA network had a star-like shape, where haplotypes were shared between different populations (Fig. 3A). One haplotype present only in Puerto Rico was found in the center of the network, potentially representing an ancestral haplotype. The rag 2 network had three most frequent haplotypes shared by all populations (Fig. 3b). AMOVA results for all scenarios showed that most of the genetic variance in mitochondrial and nuclear markers corresponded to differences within populations, with values between 98-100% (Tab. 4).
Migrate analyses showed a lack of directionality (Tab. 5), ranging from 61 to 823 migrants between sites for the concatenated mtDNA, and from 169 to 887 migrants for the rag 2. The comparison of migration between Caribbean and Brazil did not show an obvious directionality, with a slightly higher number of migrants from the Caribbean to Brazil for to the concatenated mtDNA markers, and an inverse pattern displayed by the nuclear rag 2. However, some sites showed higher values of emigration than immigration (Fig. 4A-B). In general, southern rocky reefs sites received more migrants than were exported, a pattern shared by the Caribbean population when considering the rag 2 gene (Fig. 4B). Brazilian sites with developed coral reef systems, such as Tamandaré and Abrolhos, showed higher rates of emigration than immigration (Fig. 4).

DISCUSSION
The genetic diversity of Chaetodon striatus showed a pattern of low nucleotide diversity and high haplotype diversity, where most of the genetic differences were present among individuals from the same locality. As for other widespread Atlantic reef fishes (Rocha et al., 2008;Silva et al., 2015), these results suggest high connectivity among populations, which is also supported by low Fst values and a lack of genetic structure between provinces and ecosystems. In addition to relatively long PLDs, butterflyfish larvae display an impressive swimming ability prior to settlement, which may increase their dispersal potential (Stobutzki, Bellwood, 1997;Fisher, 2005;Leis, Yerman, 2012). This characteristic may favor the wide distribution of the butterflyfish species, leading to high population connectivity and ecological plasticity.
Moreover, we found evidence for a recent population expansion. Based on neutrality tests performed for all datasets, Tajima's D and Fu´s Fs were negative and non-significant for all populations, and coalescence and Bayesian skyline plot analyses indicated that the population expansion began around 30-120 thousand years ago, all corroborating to the recent population expansion scenario. These results are similar to previous studies on coral reef fishes (Ludt, Rocha, 2015), where sea level fluctuation due to Pleistocene glaciations changed the habitat availability and consequently population size of several shallow water reef fishes (Ludt et al., 2012). Pleistocene sea level low-stands reduced the Caribbean coral reef area by up to 92%, increasing the strength of the Orinoco/Amazon barrier and the isolation of the Brazilian Province (Ludt, Rocha, 2015). However, the central haplotype (potential ancestral) found in the Puerto Rican population may corroborate recent findings that point to the fact that the impact of sea level low-stand was stronger in the southwestern Atlantic (Pinheiro et al., 2017(Pinheiro et al., , 2018. The tropical Brazil accounts for less than 5% of the western Atlantic reef area (Moura, 2000), and is largely influenced by riverine discharge, offering rough conditions for coral reef fishes during these unfavorable times. Habitat persistence during periods of climate change has been proposed as one of the main factors for preserving marine biodiversity (Pellissier et al., 2014) and it may explain the greater reef fish richness found in the Caribbean compared to the Brazilian Province (Floeter et al., 2008;Pinheiro et al., 2018).
Differently to our predictions, and consistent with results obtained for many other fish species from other families (Rocha, 2003;Rocha et al., 2015), neither the Orinoco/ Amazon outflow barrier (Rocha, 2003;Floeter et al., 2008), the São Francisco River (Cunha et al., 2014), the remoteness of oceanic islands (Neves et al., 2016;Pinheiro et al., 2017;Dias et al., 2019) nor the ecosystem divergence (coral vs. rocky reefs; Floeter et al., 2008;Pinheiro et al., 2018) resulted in population structure in the western Atlantic. Although the lack of structure could indicate limited time to divergence, high levels of migration seem to be the main driver connecting all populations of C. striatus. Most of species that display deep lineages divergence and speciation between the Caribbean and Brazil are smaller and inhabit shallower waters than C. striatus (Pinheiro et al., 2018). These high migration rates suggest that C. striatus might be also connected to other Greater Caribbean sub-provinces (Gulf of Mexico and USA East Coast) not assessed in this study (Robertson, Cramer, 2014), despite the very low abundance of the species in such regions (Jeffrey et al., 2001;Robertson et al., 2016).
Interestingly, the high contribution of migrants from the peripheral Brazilian Province to the Caribbean has also been found in other studies (Rocha et al., 2008). Migration is strong enough to overcome the species distributional gap existent in the large area influenced by the Amazon-Orinoco Plume (Fig. 1), a region also affected by vortices that cause retroreflection (NASA, 2013). This phenomenon is driven by the North Brazilian Current, which derives from the South Equatorial Current, running from the eastern coast of Brazil towards the Caribbean. On the other hand, the high connectivity among all analyzed populations indicates that the main coastal currents found along the western Atlantic are flexible to the counter-flow of migrants. Thus, the multi-directionality in larvae migration and the permeability of geographic barriers allow both genetic homogeneity in widespread species (as shown in this study) as well as vagrant dispersal events of provincial species (Freitas et al., 2014).
As a generalist zoobenthivore, C. striatus presents high dietary plasticity, and does not have differences in density and foraging rates or nutritional conditions along its distribution range (Liedke et al., 2016). Our results are consistent with the idea that C. striatus may have evolved sufficient ecological plasticity in the western Atlantic, to enable that species to colonize the largest latitudinal gradient for an Atlantic butterflyfish, encompassing 44 degrees of latitude, and this is reflected in broad genetic signatures over its entire range.