Phylogeography of Drosophila buzzatii (Diptera, Drosophilidae): responses of the species to Quaternary climates in tropical and subtropical South America

: Drosophila buzzatii (Diptera: Drosophilidae) is a fly that breeds exclusively on decaying tissues of cacti species widely distributed in tropical and subtropical areas of South America. This distribution includes biomes in distinct climatic regimes (e.g., seasonal rain forest, semi-arid scrubs, savannas, and grasslands), which at first glance could might give the false impression that the species is not sensitive to either climate or vegetation physiognomies. However, detection of historical demographic events within D. buzzatii reveal the interplay between climate and the population structure of the species as the Late Quaternary climate changes occurred. To understand this process, we performed a phylogeographic analysis based on sequences of the mitochondrial gene COI for 128 individuals from 43 localities. Our analyses combined coalescent methods, population genetics, and paleodistributions estimation methods. Our study reveals that the COI haplotype diversity is geographically structured, with a decreasing cline from north to south. The results suggest an ancient range expansion, dated from 610k to 550k years before present, in the northernmost region of the species distribution, the Caatinga vegetation. More recently, an intense gene flow and a population expansion were detected in the central and south portions of its distribution. The demographic events detected date back to the glacial periods of the Quaternary.


INTRODUCTION
The existence of sister taxa occurring in distinct neighboring biomes is usually seen as evidence of former historical connections and disconnections between these biomes.However, under a more refined demographic approach, a species with wide distribution can generate a better understanding of events that occurred in distinct biomes, once one can detect interruptions in gene flow or migration in response to biome changes (Hewitt 2004, Bragg et al. 2015).
The Drosophila buzzatii Patterson and Wheeler, 1942 species cluster comprises a monophyletic groups composed of seven species that use rotting cacti tissues for larval development (Manfrin et al. 2001, Manfrin & Sene 2006).Phylogeographic analysis performed for these species revealed that the main events that explain their current genetic variation distribution are range expansion followed by isolation by distance (De Brito et al. 2002a, Franco & Manfrin 2013, Moraes et al. 2009).However, phylogeographic analyses were only performed for species restricted to one or two biomes (Carnaval et al. 2009).
Drosophila buzzatii is the sister species to the remaining species included in the cluster and it has a wide geographical distribution, including distinct tropical and subtropical landscapes (Manfrin & Sene 2006).This distribution makes it an interesting biological model since an interruption in the gene flow or new migratory events in the species could be a response to changes in the dynamics of the biome.The species is found in the Caatinga (a semi-arid biome, consisting of dry forest and shrubby vegetation, located in northeast Brazil), Cerrado (a mesic, albeit seasonal, savanna, occurring mainly in central Brazil), Chaco (a dry forest ranging from southern Bolivia and northern Paraguay to northern Argentina), Atlantic Forest (a mostly coastal forest ranging from northeast to south Brazil) and Pampa (which occurs in Uruguay, northeast Argentina and in the southernmost territories of Brazil,consisting of grassy prairies with moderate, year-round precipitation) (Ab'Saber 1977, Fiaschi & Pirani 2009, Morrone 2006, Oliveira-Filho & Ratter 1995, Oliveira-Filho et al. 2006, Overbeck et al. 2006, Zanella et al. 2010) (Fig. 1).The reproductive biology of this fly requires oviposition in rotting cactus and, therefore, populations are evolutionarily associated with these plants.Cacti, on the other hand, depend on semi-arid or dry climates, or shallow soils incapable of retaining water.These conditions are observed today in all the biomes mentioned above, either as climatic phenomena (extended dry season, low average precipitations) or due to the presence of geomorphologic and geologic features which favor the maintenance of shallow soils, such as rock outcrops (found in all of the abovementioned biomes).The obligatory association of the fly with many cactus species creates an interesting model to investigate how gene flow is currently occurring, and propose scenarios in which interruptions to this flow could help create genetic structure.
Previous studies have shown that, despite its wide distribution, D. buzzatii has low levels of genetic structure and a high level of gene flow between populations (De Brito et al. 2002a, Kuhn et al. 2003), except for the occurrence of private enzymatic alleles and a high level of polymorphism in mtDNA sequences in populations of the Caatinga biome (Barker et al. 1985, De Brito et al. 2002a).
Monophyly, divergence times, and evolution of host plant use were inferred from a revised phylogeny of the Drosophila repleta Wollaston, 1858 species group (Manfrin et al. 2001, Manfrin & Sene 2006).All these were dated as Prequaternary events and there is a consensus that the glacial cycles have affected tropical and subtropical South America by alternating drier and humid climates affect (Clapperton 1993, Ledru et al. 2005, Sene et al. 1988).
Here, we investigate the phylogeographic history of D. buzzatti with the mitochondrial gene COI and evaluate: (1) the population structure of Drosophila buzzatii; (2) how the observed genetic patterns may be related to Quaternary climate change in South America; (3) how many and what are the directions of migration events between populations; and (4) its predicted distribution using Species Distribution Modelling (SDM) to determine potential corridors for dispersion or gene flow interruption.
The DNA extraction was performed using Wizard Genomics DNA Purification Kit (Promega, Madison, WI, USA).We PCR amplified and sequenced 710 bp fragments of the COI mitochondrial gene using the primers 1406f (Simon et al. 1994) and 2191r (De Brito et al. 2002a).Amplification was performed in a 20 µl reaction with 50-75 ng of DNA, 2,0 mM MgCl 2+ , 0,2 nmol/µl dNTPs, 4 mM each primer, and 0,14 µl of Taq polymerase (5U/µl) in a PCR buffer (pH 8.5) with Tris-HCl, KCl (PROMEGA) The PCR conditions were one step of 94°C for 90 sec., 30 cycles of 94°C for 40 sec., 47°C for 40 sec.and 72°C for 120 sec, and a final extension of 72°C for 10 min.The PCR products were visualized in 1% agarose gels stained with ethidium bromide and purified using ExoSAP-It Kit (Pharmacia).Sequencing reactions were performed according to the BigDye Terminator Cycle Sequencing Ready Reaction Kit (Perkin-Elmer, Foster City, CA, USA).Automatic DNA sequencing was performed in an ABI Prism™ 377 sequencer (Perkin-Elmer).

Genetic diversity and network
49 of a total of 128 DNA sequences were retrieved from Genbank and had been previously published (De Brito et al. 2002a).

Genetic structure and demographic analysis
To evaluate population structure, we used the Bayesian clustering method in the program BAPS v6.0 (Corander et al. 2013), which was performed using the admixture analysis, with values of K ranging from 1 to 6,200 iterations, and five replicates.The group with the highest log likelihood was selected as optimal.Furthermore, we performed the analysis of global molecular variance (AMOVA) with hierarchical groups, according to biomes and the preliminary results of the BAPS program using Arlequin 3.5 (Excoffier et al. 2005).The Mantel test (Mantel 1967, Smouse et al. 1986) was performed using Fst values to check for significant correlations between genetic and geographic distances, estimated using the central geographical coordinate of the distributed points in each biome and the distance from this point to the other points using the equation: Distance i-j = 6371(ACos(Cos(π(90-Latitudei)/180)Cos((90-Latitudej) π/180))+Sen((90-Latitudei) π/180) Sen((90-Latitudej) π/180)Cos((Longitudej-Longitudei)π/180)) and performed in Arlequin 3.5 (Excoffier et al. 2005).
The demographic equilibrium was tested by calculating Fu and Li's F(Fu & Li 1993) and Tajima's D (Tajima 1989) statistics and, to check the demographic expansion, we evaluated both the mismatch distribution and the use of fit between observed and expected distributions by the sum of square deviations (SSD) in DnaSP 5.0 (Librado & Rozas 2009).
Expansion times were estimated with Extended Bayesian Skyline Plot (EBSP), in Beast v1.8.4 (Drummond & Rambaut 2007), using the HKY+I+G substitution model.We opted for a   (Beerli 2006) was used to calculate migration rates and effective population size between pairs of populations, using a coalescent approach (Beerli & Felsenstein 1999).To perform the analyses, we relied on a maximum likelihood estimation with 10 short and 3 long chains, discarding 10.000 trees as initial 'burn-in'.This analysis was performed five times to verify the consistency of the results using a "geofile", consisting of the distance between the geographic cores of the samples in each biome.For the first run, we used values of Theta calculated by Fst and, for subsequent runs, the input values consisted of those obtained in the previous run.

Species Distribution Modelling (SDM)
Past and current potential distribution of Drosophila buzzatii was performed using the Species Distribution Modelling (SDM) approach.For this analysis, we used both current layers, which represent a moist climate, and layers from the Last Glaciation Maximum (21,000 years before the present), which comprised a drier climate, in the resolution of 2.5 arc-minutes.We used the following algorithms and programs to perform the models: the maximum entropy algorithm implemented in MaxEnt (Phillips et al. 2006) and the Support Vector Machine (SVM) implemented in Open Modeler v. 1.4.0 (Muñoz et al. 2011).The sampling points used for these analyses are listed in table III. and the layers used were as follows: annual mean temperature (Bio 1); mean diurnal range (Bio 2); temperature seasonality (Bio 4); maximum temperature of warmest quarter (Bio 8); minimum temperature of coldest quarter (Bio 9); temperature annual range (Bio 7); annual precipitation (Bio 12); precipitation seasonality (Bio 15); precipitation of wettest quarter (Bio 16); and precipitation of driest quarter (Bio 17).These layers represent important environmental conditions for D. buzzatii and their host cacti one time extreme abiotic conditions like high humidity or high temperature variation could affect the survival of the cacti or the fly.We used the "dontextrapolate" function in MaxEnt to avoid spurious projections (Giovanelli et al. 2008, Thomé et al. 2010).The Lowest Predicted Value Threshold (LPT) was used for threshold definition, and probabilities below the threshold value were transformed to zero; a binary map was produced in ArcGis v. 10. 1 (ESRI 2012).The analysis accuracy was checked using 70% of the data as training data and 30% as test data for model validation in 100 replicates.The model evaluation was made by the underlying area (AUC) (Fielding & Bell 1997, Manel et al. 2001).Models' projections (current and Last Interglacial models from Community Climate System Model CCSM, and Model fo Interdisciplynary Research on Climate, MIROC) from the analysis were acquired in the Worldclim database (available at: http://www.worldclim.org/past).

Genetic diversity and network
The standard diversity indexes for the 31 COI haplotypes observed from the 128 sampled sequences are shown in table I.The haplotype Hap1 was the most frequent (67.21% of all samples) and widely distributed in all biomes.The haplotype Hap21 was the second most frequent (7.37%) and it was found only in the Caatinga and Cerrado biomes.The number of private haplotypes for each biome is as follows: 9 for Caatinga, 5 for Cerrado, 6 for Atlantic Forest, 2 for Chaco, and 3 for Pampa (Fig. 1 and Table II).
The highest nucleotide diversity and haplotype diversity were found in the Caatinga populations (π = 0.00265, Hd = 0.79), followed by Cerrado, Atlantic Forest, Pampa, and Chaco, in a decreasing north to south cline (Table II).

Genetic structure and demographic analysis
The results of BAPS showed that the optimal genetic structure is composed of two groups: Group 2, with populations 1, 2, 3, 4, 5, 10, and 11 all belonging to the Caatinga biome, and populations 23-30 from the Atlantic Forest; Group 1, with the remaining populations (Fig. 2).
The global AMOVA indicated weak population structure (Fst = 0.31, p < 0.05).In the different scenarios tested with a hierarchical AMOVA, we found better support for the scenario of two groups inferred by BAPS (Fct = 0.54, p < 0.05), being those that maximized the Fct and minimized the Fsc.The use of AMOVA to test the biomes has been proved not to be appropriate for grouping, since the difference among groups is lower than that within groups (Table III).
The Mantel test did not reveal a correlation between population structure (Fst) and the geographic distance in D. buzzatii (correlation coefficient (rY1) = 0.90; p > 0.05).
Together, the neutrality tests and the mismatch distribution only indicated a scenario of demographic expansion for Group 1 and all D. buzzatii samples.However, for Group 2, the hypothesis of expansion supported by the mismatch distribution analyses cannot be rejected (Table IV; Supplementary Material -Figure S1).These results were corroborated by the EBSP analysis, for which the result (not including the value zero) was in the 95% HPD interval for the number of demographic changes (Heled & Drummond 2008), indicating an older  population expansion in the last 0.05-0.08Ma (Figure S2).
Our results from MIGRATE showed migratory movements from Caatinga to Atlantic Forest; from Cerrado to Caatinga and Pampa; from Atlantic Forest to Cerrado; from Pampa to Chaco; and from Chaco to Atlantic Forest (Table V).The most intense movements occurred from Cerrado to Pampa, followed by Caatinga to Atlantic Forest, and then from Chaco to Atlantic Forest.
The Lowest Predicted Value Threshold (LPT) calculated was to the current MaxEnt model: 0.0903 and to SVM: 0.1576.Last glaciation models were plotted based on the current results in MaxEnt (Fig. 3; Figure S3).Current models showed a narrow distribution area between Caatinga, Cerrado, and Atlantic Forest and large areas potentially connecting Chaco to Pampa.Both CCSM and MIROC predictive models showed a connection between Cerrado and Chaco, although the Pantanal biome is not shown in the current models.The MIROC model showed a connection along with the current submerse coast of Brazil and Uruguay that could represent a past distribution when the ocean levels were lower than what is seen today.The overlapping map (Fig. S1) showed two main stable regions: (i) a narrow portion between Caatinga, Cerrado, and Atlantic Forest, not connected with Pampa;

DISCUSSION
The Drosophila buzzatii cluster is composed of endemic South American species, whose larval development occurs exclusively in the decaying tissues of different cactus species.Some species of this group show an intimate relationship with one or a few cactus species, which makes them restricted in distribution by the cactus occurrence in one or two biomes (Manfrin & Sene 2006).The current distribution of the D. buzzatii cluster seems to be related to glaciation events, where cooling and drought periods enabled expansion and contraction movements  of their host cacti (Moraes et al. 2009).Although D. buzzatii, sister to the remaining species of the clade, has a wider distribution, occurring in all biomes in the east portion of South America, in tropical and subtropical regions, it is very often found in sympatry with other species in the same group.In the present work, the demographic history of D. buzzatii, inferred using information on genetic variation based on COI sequences, suggests area expansion and migrant interchange between biomes in the east portion of tropical and subtropical South America.
Low haplotypic and nucleotide diversity indexes had been previously reported (De Brito et al. 2002a, Franco et al. 2006, Kuhn et al. 2003) although the Caatinga biome showed the highest values in our analyses.Considering the founder effect in natural species, high values of genetic diversity are often found in older populations, while more recent populations usually show a reduction in diversity due to the foundation effect and deviation bias.A classic example is seen in the Silvereyes (Zosterops lateralis Latham, 1802) populations, well documented in Clegg et al. (2002).According to our data, diversity was lost in a north to south direction, suggesting that the Caatinga has the oldest remnant populations of D. buzzatii.Previous studies have suggested an older population of D. buzzatii in the Chaco, due to the high levels of chromosome inversion (Naveira et al. 1984).Nevertheless, Drosophila buzzatii has recently colonized areas in Spain and Australia with a significant bottleneck, and high levels of chromosomal polymorphism was found in these populations (Hasson et al. 1992).According to this, the high levels of chromosomal inversions found in the Chaco is not necessarily an indication of an ancient population, but the result of natural selection.Variation in the viability, developmental time and thorax size of D. buzzatii is dependent on two chromosomal inversions and host cactus species used by the fly, concluding that the cactus/karyotype relation is determinant for the adaptative value (fitness) of D. buzzatii (Iriarte et al. 2003, Iriarte & Hasson 2000).Similar results for Drosophila mojavensis have been found, showing that chromosomal inversion polymorphisms are only present in populations using the cactus agria [Machaerocereus gummosus (Engelm.)Britton & Rose] as a host in Baja California (Powell 1997).
One possible explanation for the current genetic diversity distribution is a significant and ancient movement, with the species expanding its distribution from the Caatinga biome to the southern areas, such as the Atlantic Forest biome.Similar movements were detected in other species of the cluster D. buzzatii.For example, in Drosophila gouveai Tidon-Sklorz & Sene, 2001, two groups expanded their area through mountain tops from the south to the Caatinga biome, one in the east (Group G1) and another in the west portion of the Paraná River in Central Brazil (Group G2) during glaciations (Moraes et al. 2009).Other examples are D. serido Vilela & Sene, 1977, D. seriema Tidon-Sklorz & Sene, 1995and D. borborema Vilela & Sene, 1977, in the Caatinga, at the Chapada Diamantina.Specifically for D. serido, a migratory route from the interior of the state of Bahia to the coast of the state of Santa Catarina, in south Brazil, has been previously described (Franco & Manfrin 2013, Kokudai et al. 2011).
These migratory movements were possible due to climatic connections reflected by vegetation continuity between the biomes studied.The analysis of floristic composition in the seasonally dry tropical forest (SDTF) by Prado (2000) showed three cores -I: Chaco and related vegetation; II: Amazonia, Atlantic Forest, and Cerrado vegetation; III: tropical seasonal forests of South America, Caatinga, the calcareous outcrops in Minas Gerais/Mato Grosso do Sul and the Planalto forests of São Paulo, to the Upper Uruguay and Paraná River valley vegetation, and the Subandean Piedmont Forests.Those cores were connected by what this author named "tracks" (see Prado 2000) or paths.The first nucleus (Chaco) is connected to the second (Cerrado, Amazonia, and Atlantic Forest) by two paths: the northern one links these regions by the calcareous outcrops in the Cerrado, the central portion of Brazil, and the western portion of Bahia; the southern one extends along the valleys of the São Francisco and Jequitinhonha rivers and to the south, in the Rio das Velhas to the Belo Horizonte area and the states of Rio de Janeiro, São Paulo, Paraná, and Santa Catarina.According to our SMD models, an area of stability connecting these biomes is estimated (Fig. 3).Interestingly, the cactus Cereus hildmannianus K. Schum is found in non-flooding portions of the Pantanal biome, known as "Capões" (personal observation).The second nucleus (Paraguay-Parana basin core) links the three nucleuses and has vegetation formations that are very similar to the Caatinga arboreal formation.The third nucleus extends from the southeastern region of Catamarca in northwestern Argentina, to Bolivia and the Brazilian state of Acre (Prado 2000).
Using a molecular clock and a coalescentbased analysis, the expansion movement was estimated at approximately 510,000 ybp, with an exponential growth shown by the BSP analysis (Fig. 2).This population expansion matches the end of the Pre-Illinoian glaciation stage (2.5-0.5 Ma), a global dry/cold period that might have opened the way to the expansion of open vegetation and its associated fauna in South America.The expansion of open vegetation in South America during dry/cold glaciation events has been suggested by many authors (Ab'Saber 1977, Ledru et al. 2005, 2006, Pennington et al. 2000, Prado 2000, Prado & Gibbs 1993).According to this hypothesis, in dry/cold environmental conditions, plant species would disperse to new areas and very likely be followed by migration of its associated fauna as well.One example is the expansion of the seasonally dry tropical forest (SDTF) through the south and north routes proposed by Prado (2000).According to Pennington et al. (2009), the SDTF is distributed in clades, which they named nucleuses, that are interpreted as ancient biomes, and in South America, one of them formed an arc of plant species adapted to dry conditions, called the Pleistocene Arc (Prado 2000, Prado & Gibbs 1993).The connections between these cores in dry/cold periods of vegetation expansion have been possibly used as a bridge to the population expansion in D. buzzatii.
The BSP's results from each biome (data not shown) suggests a population growth in the Illinoian period (0.3 -0.13 Ma) with exception of the Caatinga biome (Pre-Illinoian, around 0.51 Ma).According to our results, the oldest population growth movement was detected in the Caatinga biome, and it followed a south route towards the Atlantic Forest, around 0.51 Ma ago.This movement was identified by the Migrate analyses with an effective migrant number of 2.10e 3 (table IV).Caatinga, Cerrado and Atlantic Forest are adjacent biomes and share the STDF vegetation (Pennington et al. 2009).Connection areas in the model overlapping showed similar results to the South Route of SDTF dispersion as proposed by Prado (2000).
Another expansion movement detected in this work occurred from the Atlantic Forest to Cerrado, around 295,000 ybp.The Cerrado is located alongside the Caatinga and the Atlantic Forrest, sharing many species with these biomes.This geographic position could allow species exchange between those biomes during dry/cold periods by opening routes promoted by vegetation expansion.Another expansion movement was found around 240,000 ybp, from Cerrado to Caatinga and Pampa.Similar dispersion events were found in other Drosophilidae species in a very short period (De Brito et al. 2002b, Moraes et al. 2009).Moraes et al. (2009) proposed that D. gouveai expanded its area around 313,000 ybp (see table II) (Moraes et al. 2009), which is in agreement with our period of expansion, at least for the movement from Cerrado to Caatinga.Another similar pattern was proposed for D. antonietae Tidon-Sklorz & Sene, 2001, which has an intimate association with the cactus Cereus hildmaniannus, occurring in mesophilic forests along rivers of the Parana-Paraguay basin.For this species, migration pattern from the state of São Paulo, up north, reaching the states of Paraná and Rio Grande do Sul was found (De Brito et al. 2002a).
The most recent expansion movement was from Pampa to Chaco, which could not be further analyzed, since we did not sampled specimens from areas in between these two biomes.One probable explanation for this pattern could be that an expansion of Cereus hildmannianus to Chaco connected these two biomes, and the movements could be similar to those found in D. antonietae (Mateus & Sene 2007).However, we detected an expansion movement from the Chaco to the Atlantic Forest at around 270,000 ybp, which might indicate that D. buzzatii was in the Atlantic Forest before reaching the Pampas.One hypothesis to explain this could be an ancient connection between these biomes through the Pantanal (Fig. 3).However, this connection does not exist today, and this could be a result possibly generated due to the fact that no samples from these regions were collected.
Our data suggest that D. buzzatii has a high dispersion capacity, with high gene flow and low genetic structure between biomes.The dispersion/expansion movements, in general, follow a north to south pattern crossing many biomes in cold/dry climatic periods.These patterns of dispersions help to understand the dynamics of historical migratory events by species that have been very influenced by the glaciations in South America.Conservation strategies must consider these patterns when their action are planned, and could influence the natural dynamics of fauna and flora sensitive to these changes.
de São Carlos -UFSCar), Marco Antônio T. Marinho (Universidade Federal de Pelotas -UFPel), Rafaela L. Falaschi (UEPG) and Simeão S. Moraes (Universidade Estadual de Campinas -UNICAMP) for their suggestions on the first drafts.Special thanks to the Departamento de Genética (FMRP-USP) for giving structural and personnel support.During the preparation of this paper, MHS benefited from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES -Finance

Figure 1 .
Figure 1.Haplotype network for COI sequences and a map of South America showing sampling points for Drosophila buzzatii and the biomes comprised in this study.Each line of the network represents one mutational step.The size of each circle indicates the number of individuals having that haplotype, and colors represent distinct biomes.Black circles represent putative unsampled haplotypes.Numbers shown on the map are related to populations, and the network numbers indicate haplotypes.
and (ii) Chaco with Pampa, with only a few stable areas in Pampa.Areas in Peru and Chile have also been projected, although they were not considered natural distribution areas of D. buzzatii.

Figure 3 .
Figure 3. Summary map of historically stable presence areas of Drosophila buzzatii.These projections were obtained by overlapping the predicted logistic outputs in binary maps after threshold calculation in ArcGis 10.1.In A, stable areas according to each model; in B, the model for the current distribution using MaxEnt; in C, SVM model; in D, the model for the last glaciation calculated by MaxEnt, using MIROC prediction; and in E, the model for the last glaciation calculated by the MaxEnt, using CCSM prediction.

Table I .
Location and geographic coordinates of samples used to model the potential current and past distributions of Drosophila buzzatii, with a summary of the COI sequences descriptive statistics.

Table I .
Continuation.

Table II .
Standard diversity indexes calculated for Drosophila buzzatii.
N, number of individuals; Hd, haplotype diversity; π, nucleotide diversity; s, number of variable sites; k, average number of nucleotide differences; Hap, numbers of private haplotypes; Pop, population; Fst, p<0.05*

Table IV .
Neutrality tests results for Drosophila buzzatii populations.

Table V .
Estimative of direction and number of migrants per generation calculated by the program MIGRATE.Biomes with a + indicating migrant receiving.