DNA Barcoding unveils cryptic lineages of Hoplias malabaricus from Northeastern Brazil

(With 3 figures) Abstract The trahira or wolf fish - Hoplias malabaricus - is a valid species, although recent cytogenetic and molecular studies have indicated the existence of a species complex. In this context, the present study analyzed the mitochondrial COI marker to determine the levels of genetic diversity of specimens from the Brazilian state of Maranhão, and verify the occurrence of distinct lineages within the study area. Samples were collected from the basins of the Turiaçu, Pindaré, Mearim, Itapecuru, and Parnaíba rivers. A 630-bp fragment was obtained from 211 specimens, with 484 conserved and 108 variable sites, and 60 haplotypes (Hd = 0,947; π = 0,033). The phylogenetic analyses indicated the existence of three distinct lineages of H. malabaricus from Maranhão. Genetic distances of 1.5-8.2% were found between all the populations analyzed, while the variation between haplogroups ranged from 2.1% to 7.7%. The AMOVA indicated that most of the molecular variation was found among groups, with high FST values. The high levels of genetic variability found in the present study are supported by the available cytogenetic data. These findings reinforce the need for the development of effective programs of conservation and management independently for each river basin, in order to preserve the genetic variability found A AMOVA indicou que a maior variação molecular foi entre os grupos, com altos valores de FST. Os altos níveis de variabilidade genética encontrados no presente estudo são suportados pelos dados citogenéticos disponíveis. Essas descobertas reforçam a necessidade de desenvolver programas de conservação e manejo independentemente para cada bacia hidrográfica, a fim de preservar a variabilidade genética


Introduction
The family Erythrinidae encompasses a small group of Neotropical characiforms, known as trahiras or wolf fish. It comprises of about 15 valid species distributed in the genera Erythrinus Scopoli, 1777, Hoplerythrinus Gill, 1985, and Hoplias Gill, 1903(Bifi, 2013. The family is endemic to South and Central America. Despite being a relatively small group, the precise identification of the species remains problematic (Oyakawa and Mattox, 2009).
Hoplias malabaricus has an elongated, cylindrical body, covered with cycloid scales, and an enlarged head, wide mouth, and rounded caudal fin, with black dots. The coloration of this fish is golden brown, with some dark patches, although this may vary with age. The teeth are robust, conical, and varying in size. The palate has two rows of conical denticles in a V-shaped configuration (Bertollo et al., 2000). Wolf fish are carnivorous predators of aquatic insects, shrimp, and small fish, which they capture by sit-and-wait foraging. Hoplias. malabaricus is well adapted to lentic environments, although it may also be found in rivers, of all sizes, where it prefers areas with vegetation, in which it can hide and wait for prey (Prado et al., 2006). In addition to its ecological importance, this species is an important resource for both artisanal and commercial fisheries, given its abundance in some areas, where it is an important source of food for local fishing communities (Barros et al., 2007).
Some of the H. malabaricus cytotypes (for example 2n = 42A, 2n = 40C and 2n = 40F) are widely distributed in South America, while others are endemic to specific Brazilian river basins (Rosa et al., 2014). As the number of study areas has increased, the known cytogenetic diversity of H. malabaricus has also expanded progressively, to much higher levels than previously thought, with variation being found even within populations defined by a single cytotype (Vicari et al., 2005;Blanco et al., 2010). These differences are apparent at molecular, cytogenetic, and even morphometric levels.
In taxonomy, morphological data has always been the primary criteria for the identification of species. The different approaches to the species concept have received increasing attention in recent years, and a number of different methods have been proposed for the delimitation of species status (Wiens and Penkrot, 2002). More recently, approaches based on DNA sequencing have become increasingly popular, making this method especially effective as a complementary tool to morphology for the identification of taxa (Ward et al., 2005(Ward et al., , 2009Hebert et al., 2003). Hebert et al. (2003) proposed the use of a sequence of approximately 650 bps of the mitochondrial gene Cytochrome Oxidase Subunit I (COI) for the identification of animal taxa at the species level, even in the case of cryptic species.
The available data indicate that H. malabaricus is taxonomically confusing and genetically diverse and is in fact a complex of cryptic species. Given this, the present study applied the COI marker to unveil the genetic diversity of H. malabaricus specimens from the Brazilian state of Maranhão, as well as determine whether the possible local cytotype(s) are characterized by high levels of genetic diversity. The understanding of the genetic variability of the populations of this species found in the rivers of Maranhão will be fundamental to the future development of effective programs of conservation and management. The results of the study indicate the existence of several different genetic lineages in the Maranhão populations, which may even be consistent with the presence of more than one species in a given river basin.

Sampling
The study (Table 1) included a total of 144 specimens of H. malabaricus collected from the basins of the Itapecuru, Pindaré, Mearim, Parnaíba, and Turiaçu rivers between 2010 and 2016 ( Figure 1). The specimens were collected using a variety of fishing techniques, including dragnets, gillnets (of varying mesh sizes), and cast nets, and weirs. All the specimens were euthanized by immersion in freezing water (Ashley, 2007) and deposited in the scientific collection of the Laboratory of Genetics and Molecular Biology of the Caxias campus of Maranhão State University in Caxias, northern Brazil.

Extraction of the DNA, PCR, and sequencing
Total DNA was extracted from muscle tissue using a commercial Promega Wizard® Genomic DNA Purification, according to the manufacturer's instructions. A 650-bp sequence of the 5' region of the Cytochrome Oxidase I (COI) gene was isolated using the COIF1/COIR1 primers (Ward et al., 2005). The PCR was run in a final volume of 25 μl, with 1 μL (250 ng/mL) of the DNA, 0.25 μL (200 ng/mL) of each primer (forward and reverse), 2.5 μL of 10 x buffer (200 mM Tris-HCl, pH 8.4, and 500 mM KCl), 4 μL (1.25 mM) of DNTPs, 0.5 μL (50 mM) of MgCl 2 , 0.2 μL of 5 U/mL Taq DNA polymerase, and distilled water to complete the final reaction volume.
The PCR protocol was as follows: 2 minutes at 95 °C for denaturation, followed by 35 cycles of 30 seconds at 94 °C, 60 s at 56 °C and 60 seconds at 72 °C, followed by a final extension of 7 minutes at 72 °C. The PCR products were purified using an ExoSap-IT kit (USB Corporation), following the manufacturer's instructions. The purified products were sequenced using the Sanger et al. (1977) method, with the Big Dye Terminator v.3.1 Cycle Sequencing Ready Reaction kit (Applied Biosystems, Foster City, CA, USA). The samples were sequenced in an ABI 3500 automatic sequencer (Life Technologies).
The samples were sequenced in both directions (forward and reverse).

Data analysis
The BoldSystems (2019; Ratnasingham and Hebert, 2007) and Blast platforms were used to verify the percentage similarity among the samples analyzed in the study. For the phylogenetic reconstruction, additional 67 COI sequences of H. malabaricus were downloaded from the Genbank. These sequences were included in the analysis to determine whether the geographic distance between specimens influences their genetic divergence.
The sequences were aligned in CLUSTAL W 1.4 (Thompson et al., 1994), which was run in BIOEDIT 7.0 (Hall, 1999). The GENEIOUS program, version 4.8.5 (Biomatters), was used to obtain the consensus sequence and verify the presence of stop codons, as required for the validation of the barcoding sequence. The most adequate evolutionary model for the phylogenetic reconstruction was identified using Modelgenerator 0.85 (Keane et al., 2006). The phylogenetic tree based on Bayesian Inference was then generated in MrBayes 3.1.2. (Ronquist and Huelsenbeck, 2003), and convergence was verified in Tracer 1.5 (Rambaut and Drummond, 2009). The support for the nodes was estimated using a Shimodaira-Hasegawa-like (SH-aLRT) interpretation (Anisimova et al., 2011).
The genetic polymorphism was retrieved in DnaSP 5.10 (Librado and Rozas, 2009). Evidence of selection in the populations was derived from the D (Tajima, 1989) and Fs (Fu and Li, 1993) tests. The matrix of genetic distance was generated in MEGA 7 (Kumar et al., 2016), based on the Kimura 2-parameter (K2P) evolutionary model (Kimura, 1980). A haplotype network was produced in Haploviewer (Salzburger et al., 2011) to determine the degree of correlation between the drainage basins and the evolutionary lineages, and infer the evolutionary relationships among the different species.
The possible existence of differentiated populations and the significance of inter-and intra-population variability were verified using an Analysis of Molecular Variance (AMOVA). Three different levels of hierarchy were adopted in the AMOVA, one which included only the populations from Maranhão, one which included the COI sequences of H. malabaricus obtained from GenBank, and the third considering the haplogroups observed in the BAPS analysis. The fixation index (FST) and its significance were obtained from 1023 random permutations. The neutrality tests of AMOVA and fixation indices were obtained in the Arlequin 3.5 program (Excoffier et al., 2007).
The hierarchy population structure was also analysed using the Bayesian algorithm, run in BAPS 6.0 (Corander et al., 2008). These analyses were based on the population mixing model with individual grouping and k values of between 2 and 5. The log-likelihood score was used to determine the best model and provided a parameter for the definition of the most probable grouping.

Results
A 630-bp fragment of the mitochondrial COI gene was obtained for 211 H. malabaricus specimens. No evidence was found of the presence of insertions, deletions, stop codons or any other alterations of the sequence of amino acids, which validates the sequences obtained here as part of the functional region of the COI gene. The analyses of the nucleotides revealed 484 conserved sites, and 108 variable sites, of which 99 were informative for parsimony analysis. A total of 60 haplotypes (Hd = 0,947; π = 0,033) were identified. The results of the neutrality tests (Tajima's D = -0,11437, Fu's F = -12,553) were negative and non-significant (p > 0.10), indicating the recent expansion of the study populations.
The identification of the specimens was confirmed through the comparison of the nucleotide sequences with those deposited in the BOLDSystems database. The BLAST searches returned a 99.50-100% similarity between the specimens from the Amazon, Paraná, Plate, and São Francisco basins, and those from the rivers of Maranhão. However, the specimens from the Patos Mirim Lagoon were 100% similar to the sequence of Hoplias sp. vieirai found in the BOLDSystems database. Even so, these specimens were 98.67% similar to the H. malabaricus sequence, which is within the 2% threshold for the differentiation of species using the DNA barcode.
The phylogenetic analyses produced haplotype trees with four -well-supported clades -distributed as follows: Clade I -Paraná + Amazon basins; Clade II -Paraná River + Argentina + Patos Mirim Lagoon; Clade III -specimens from the Maranhão basins (Itapecuru + Pindaré + Parnaíba + Turiaçu + Mearim) and the Amazon River, Clade IV-São Francisco basin. A more detailed analysis of the topology revealed the existence of several different haplogroups, including five in the basins of the rivers of Maranhão, with bootstrap support ranging from 0.78 to 0.99 (Appendix S1).
The result of the analysis haplotype network indicated a clear separation between two haplogroups: -one formed by the haplotypes from the Paraná and Plate basins, together with the Patos Mirim lagoon, and the other, by those from the Paraná and Amazon basins. In addition to these two genetically distant groups, the populations from Maranhão were distributed in different lineages, with three distinct haplogroups. The first of these haplogroups includes haplotypes from the Turiaçu, Mearim, Pindaré, Itapecuru, and Parnaíba rivers, together with the Amazon. The second haplogroup encompasses one lineage from the Parnaíba basin, one from the Amazon, and the haplotypes from the São Francisco River. The third haplogroup from Maranhão was formed by the specimens from the Parnaíba, Itapecuru, and Pindaré basins (Figure 2).
The mean genetic distance between populations from river basins ranged from 1.5% to 8.2% when the Brazilian haplotypes are compared with those from Argentina. The distances between the specimens from Maranhão varied from 0.8% to 2.7%, while the distances between these specimens and those from the Amazon basin range from 3.7% to 4.4%. Mean intraspecific diversity was 0.18-4.77%. All other parameters are shown in Table 2. An additional analysis was run to determine the genetic distances between the haplogroups indicated by the haplotype network, which returned distances of between 2.1% and 7.7%, while those between the Maranhão lineages varied from 2.1% to 3.3% (Table 3).
The results of the AMOVA indicate that most of the variation was among groups, with significant F ST values. When the specimens from Maranhão were analyzed separately, most (64.01%) of the variation was also found among populations. To confirm the existence of three distinct  lineages in the rivers of Maranhão, an additional AMOVA was run, considering the haplogroups indicated by previous analyses. In this scenario, most of the variation (55.05%) was among populations, with an F ST value of 0,550, with a highly significant p value, of less than 0.0001 (Table 4). The results of the BAPS program demonstrated the existence of two different lineages in the Itapecuru and Mearim basins, and three in the Parnaíba basin, although only a single lineage was found in the Pindaré and Turiaçu basins. This pattern was upheld when the GenBank specimens were included in the analysis. Three distinct lineages were found in the Amazon and Paraná basins, whereas a single lineage was found in the River Plate and Patos Mirim Lagoon (Figure 3). Both the haplotypes network and the BAPS analysis indicated the existence of three evolutionary lineages in the specimens from Maranhão, which are undergoing genetic differentiation consistent with a process of speciation. It is important to note here that the existence of these three lineages is corroborated by the minor morphological variation found in specimens from Maranhão in previous studies.

Discussion
The effectiveness of the DNA barcode for the evaluation of biodiversity and the resolution of taxonomic problems, such as species identification and discovery of cryptic diversity on fish species, cannot be over-emphazied., delimitation of species in a large number of taxa (Hebert et al., 2003;Clare et al., 2006;Ward et al., 2009). Our study demonstrates the relevance of DNA barcoding for the identification and uncovering lineage diversity of fish species from Brazil. This study contributes DNA barcode data that would be important in identification of H. malabaricus which in turn will aid the management of this economically important fish species.
Further, our study recovered several lineages of H. malabaricus from Brazil. Similar to the studies of Posada and Crandall (2001) our study revealed that gene  trees may not represent the most effective approach for the identification of intraspecific groups, due to their reduced resolution, whereas haplotype networks may be the most reliable approach. This was apparent in the present study, given that the gene trees provided a confusing picture, whereas the haplotype network produced a hierarchy consistent with that of the other analytical approaches, and the same arrangement of haplogroups. This appears to reflect a relatively recent process of diversification, on an evolutionary time scale, with the incomplete separation of lineages and the formation of different haplogroups, which separate the different lineages of H. malabaricus found in the rivers of Maranhão. This result was further reinforced by the BAPS analysis, which also indicated the formation of haplogroups encompassing different lineages of H. malabaricus from Maranhão. This indicates that, while all population of H. malabaricus included in this study have the same cytotype, their genetic divergence is consistent with the separation of the lineages in different clades.
The phylogenetic reconstruction revealed the formation of at least five different haplogroups in the populations from Maranhão. This may indicate the existence of distinct lineages in the local rivers. In an analysis of the D-Loop of the H. malabaricus populations of river basins in Maranhão (the Turiaçu, Pindaré, Mearim, Itapecuru, Parnaíba) Pires et al. (2019) found a high degree of genetic variability (haplotype diversity of 0.984 and nucleotide diversity of 0.044) and three principal haplogroups in the populations from Maranhão, with well-supported bootstrap values of 88-98%. The mean genetic distance between groups ranged from 3.4% to 5.9%, with the population from the Pindaré River being the most differentiated.
Previous studies of Hebert et al. (2003) showed that high rates of intraspecific divergence may be found in geographically isolated populations. However, the findings of the present study indicate high levels of genetic differentiation within a single river basin, resulting in the presence of distinct lineages. Similarly, Santos et al. (2009) also recorded high levels of genetic divergence in a study of the H. malabaricus populations of twelve basins in eastern Brazil (located in the proximity of the watersheds), based on both karyotyping and the analysis of sequences of the mitochondrial ATPase 6 gene. The high level of divergence found in the present study thus appears to be due to local adaptation of the species.
The ecological characteristics of the species should also be considered in any interpretation of evolutionary patterns. Sedentary species tend to be less able to overcome physical barriers or avoid environmental pressures, leading to a greater accumulation of the effects of vicariant events or ecological processes in their DNA (López-Fernández et al., 2013). This may favor an increase in the genetic diversity within the populations of the different river basins, as observed in the present study.
The results of the present study indicate that the levels of genetic divergence of the H. malabaricus specimens may not be restricted to the different hydrographic basins, but may be found in sympatric individuals, as observed in the populations of the Itapecuru, Mearim, Pindaré, and Parnaíba rivers, where at least two lineages were identified, indicating the possible presence of a complex of cryptic species.
The present-day diversity and distribution of freshwater fish species may be determined by a range of factors, including the geomorphological history of the river basins they inhabit (Lundberg et al., 1998;Hubert and Renno, 2006). In the case of Brazilian river systems, one of the primary processes is the variation in sea levels provoked by cycles of glaciation (Weitzman et al., 1998). In particular, the fall in sea level during peak glaciation events may favor the confluence of adjacent river basins, and the dispersal of fish populations between these basins.
In the case of the Turiaçu basin, Nores (2004) noted that, during the Miocene-Pliocene transition, approximately 5 million years ago, sea levels increased by 100 m over a period of 800,000 years. During this period, the Turiaçu basin would have been flooded with seawater, eliminating its populations of freshwater fish. When sea levels receded, these species eventually returned to the basin.
One other process that may account for the similarities of the fauna between basins is headwater capture, which permits the dispersal of ancestral lineages, which may then diverge allopatrically in neighboring basins. Headwater capture occurs when part or all of the basin of one river is diverted to that of a neighboring river through differential rates of erosion, tectonic upraising or landslides, resulting in vicariant events and dispersal (Albert and Reis, 2011). One of the consequences of this process is the formation of non-monophyletic subclades within the population of one river that are derived from the lineages of adjacent rivers (Lundberg et al., 1998). This may account for the sharing of haplotypes between geographically isolated rivers, as observed in the present study.
The more recent the separation of populations or species, the more likely will be the sharing of haplotypes, a phenomenon known as the incomplete separation of lineages (Takahashi et al., 2001). In this case, the period that separates the lineages may not be enough to allow for the establishment of clear genetic differences (Sites Junior and Marshall, 2004). The pattern of the distribution and sharing of haplotypes found in the present study is consistent with a scenario of this type. The recent divergence of species hampers the identification of valid taxa.
In conclusion, our study is extremely important, given that changes occurring in a species at the molecular level may not always be apparent in their morphology, reinforcing the need for the integration of genetic, cytogenetic, morphological, and ecological approaches for fish taxonomy and diversity studies. While a more thorough taxonomic review, integrating morphological, molecular (mitochondrial and nuclear) and cytogenetic data is required, the distinct lineages identified in the present study should be considered as a separate evolutionary unit. Finally, further study is required to determine the evolutionary processes governing the diversification of H. malabaricus in Brazil. Such study is important, for the development of effective measures for the conservation of species.