Integrative systematics unveils the controversial identity of Engraulidae fishing stocks in a Neotropical estuary, northeast Brazil

ABSTRACT In Brazil, the use and diversity of the common names of fish species, coupled with taxonomic uncertainties, hinder the reliability of fishing statistical data. In this scenario, there are the so-called pilombetas of the São Francisco River, an important fishing resource in region. Despite its importance, the real diversity of species identified in the area remains obscure. In order to properly identify and delimit the species popularly known as pilombetas, an integrative approach involving traditional taxonomy, geometric morphometrics and molecular systematics was applied. Results from geometric morphometrics and molecular analyses were consistent with the results of the traditional morphological analysis, also indicating the delimitation of six taxa belonging to Engraulidae in the lower São Francisco River. In addition, species delimitation methods revealed an intrapopulation genetic divergence of 1.7% for Lycengraulis grossidens. The results revealed that the currently known richness species of Engraulidae in the studied area has been underestimated. Thus, an updated taxonomic key is herein proposed for the Engraulidae species from the lower São Francisco River and estuary. The integrative analysis approach revealed to be effective to address taxonomic questions and help the management of stocks, ensuring the maintenance of local diversity of fishes in the Neotropical region.


INTRODUCTION
In the Neotropical region, artisanal fishing is one of the oldest activities carried out by riverside communities, especially geared towards species of commercial importance (Begossi et al., 2004;Lins, 2011). This region houses an expressive marine and estuarine fish diversity and richness, but little is known about how fish stocks are recruited throughout their distribution, hindering the management of relevant areas and species for fishing (Blaber, 2000;Polaz, Ribeiro, 2016).
In Brazil, fishing statistics were largely built considering the common names of species rather than the scientific ones, which lead to misidentifications considering either common names used to different taxa or different common names for the same taxon between different localities (Vasconcellos et al., 2011;Previeiro et al., 2013). This approach leads to the underestimation or overestimation of species richness in an area, as well as overexploitation fish stocks. Species of Engraulidae family, for example, are popularly known in Portuguese as apapás, mulatinhas, sardinhas boca torta, manjubas and pilombetas (Silva et al., 2003;Previeiro et al., 2013). Additionally, common names usually used to Engraulidae species, are also applied for fishes from different families, such as manjuba used for both Engraulidae in southeastern Brazil, and Atherinopsidae (e.g., genus Atherinella Steindachner, 1875) in the Northeastern (Ribeiro, Molina, 2013). These oversimplifications can obscure the real conservation status of the stock of those fishes, which may already be in an overexploited state.
3/17 ni.bio.br | scielo.br/ni Engraulidae species are found in tropical and subtropical waters of the Americas, mainly in coastal regions and estuaries, but also in continental waters of South America (Whitehead et al., 1988;Nelson, 2016). The representatives of the family are important fishing resources of great commercial value, catch volume, and wide geographical distribution, making up about 25% of the total production of fishes caught commercially in the world (FAO, 2010;Vasconcellos, Csirke, 2011;Froese, Pauly, 2019). Ecologically, the species play a functional role in aquatic environments as filter-feeding fishes, being great consumers of zooplanktons and serving as a link in the food chain for piscivorous fishes and sea birds (Anderson et al., 1980;Baird, Ulanowicz, 1989;Nelson, 2016).
In the São Francisco River, artisanal fishing plays a crucial role in generating income and employment, with about 287 tons of Engraulidae caught in 2006(Ibama, 2006. This main fishing resource has been showing a sharp decline in recent years, certainly related to the decreased river flow, overfishing effort, especially in breeding seasons (Da Mata-Oliveira, 2020, pers. comm.). In addition, the changes caused in the hydrographic basin by the decrease in flow in recent years have severely impacted several stocks, including pilombetas (Medeiros et al., 2015). It is estimated that at least four species are "informally" known as pilombetas (Loeb, Figuereido, 2014;Barbosa et al., 2017).
In this region, a new species has recently been described, Anchoviella sanfranciscana Barbosa, Gomes da Silva, da Rocha Araújo & Carvalho, 2017(Barbosa et al., 2017. However, according to unpublished data (Agostinho, 2019) A. sanfranciscana is a junior synonym of Anchoviella cayennensis (Puyo, 1945). All these taxonomic issues are perpetuated, due to their great phenotypic similarity and overlapping morphological characters, obscuring the taxonomic identification, quantification of their real diversity and the understanding of their phylogenetic relationships.
In recent years, DNA barcode became the leading molecular tool for species identification, being often integrated with other data sources, enhancing the robustness of taxonomic studies (Ward, 2000;Durand et al., 2017). Among these sources, traditional morphometric measurements used in taxonomy, and geometric morphometrics stand out, allowing the detection of subtle differences at the intra and interspecific level (Vasconcellos et al., 2008;DiBattista et al., 2014;Pérez-Quiñónez et al., 2016;Nirchio et al., 2018). Studies integrating morphology and DNA have been increasingly effective in detecting cryptic lineages (Padial et al., 2010;Yeates et al., 2011), facilitating the resolution of taxonomic issues in complex groups such as the representatives of the Engraulidae family. In this context, an integrated analysis (morphology and genetics) was used to infer the number of species of pilombetas in the lower São Francisco River that are commercially exploited. This study also aims to determine whether there are species of pilombetas not discriminated by traditional taxonomic methods.

MATERIAL AND METHODS
The study area covers the region between the city of Piaçabuçu, Alagoas State (AL) (10°24'20"S 36°26'2"W) and the mouth of the São Francisco River (10°31'50.7"S 4/17 ni.bio.br | scielo.br/ni 36°23'53"W) between the states of Alagoas and Sergipe (SE), Brazil. This region houses the Piaçabuçu Environmental Protection Area (EPA), Federal Conservation Unit (Oliveira et al., 2014), in the lower stretch of the São Francisco River floodplain (Fig. 1). The river mouth region has been one of the most affected in recent years, with reduced flow, introduction of exotic species and the transposition of its waters. In addition, it has been suffering from the advance of the Atlantic Ocean (Brito, Magalhães, 2017;Cavalcante et al., 2020).
All specimens of pilombetas were retrieved from artisanal fisheries caught with a 12/13 mm mesh gillnet, from September 2017 to April 2018. The captured animals were placed in a thermal box containing ice and transported to the Taxonomic identification. Diagnostic measurements and meristic data were examined for all 124 specimens of pilombetas according to Cervigón (1989), Loeb, Figueiredo (2014) and Vera-Alcaraz, Vuletich (2016). Data was gathered with an Opticam OPZTS triocular magnifying glass in the laboratory of the Universidade Federal de Alagoas (UFAL-Penedo). Based on the analyzed characters, an updated taxonomic key for Engraulidae species inhabiting the lower São Francisco estuary was developed. ni.bio.br | scielo.br/ni Geometric morphometrics. All analyzed specimens were photographed from the left side with a 30 cm scale bar through a camera attached to a tripod. From the images, 10 body landmarks (Fig. 2) were chosen following standard procedures in studies of geometric morphometrics with fish and diagnostic characters of the group (Vasconcellos et al., 2008;Kerschbaumer, Sturmbauer, 2011;Loeb, Figueiredo, 2014). The images were digitized and analyzed using the TpsUtil (Rohlf, 2010a) and TpsDig2 (Rohlf, 2010b) softwares. Shape data (Procrustes coordinates) were used for multivariate analyses through principal components (PCA) and canonical variates (CVA), aiming to identify morphological disparities between the species. Both tests were performed with MorphoJ 2.0 (Klingenberg, 2011). DNA extractions, amplification and sequencing. A total of 60 samples of specimens were selected and sequenced based on the preliminary identification, encompassing individuals of Anchovia sp. (n = 7), Anchoviella cayennensis (n = 10), Anchoviella lepidentostole (n = 7), Anchoviella sp. (n = 13), Cetengraulis edentulus (n = 9), and Lycengraulis grossidens (n = 14). Muscle tissue was obtained from the dorsal region of the fishes and stored in eppendorfs microtubes with absolute alcohol in the freezer at -20 °C. Subsequently, the genomic DNA was extracted using the Wizard genomic DNA purification kit (Promega) according to the manufacturer's instructions. The mitochondrial gene fragment cytochrome c oxidase subunit 1 (cox I) was amplified via polymerase chain reaction (PCR), using the universal primers BarcFish1 (5' TCAACCAACCACAAAGACATTGGCAC 3') and BarcFish22 (5' ACTTCAGGGTGACCGAAGAATCAGAA 3'), following the procedures of Ward et al. (2005). All PCR reactions were performed in a 25µl final volume containing 12µl of 2X Taq master mix (Vivantis), 2µl of 40ng / µl genomic DNA solution, 0.5µl of each primer (10MM), and 15µl of ultrapure water. The reactions consisted of an initial 6/17 ni.bio.br | scielo.br/ni denaturation of 2 minutes at 95 °C, 35 cycles of 94 °C for 30s, 57 °C for 30s, and 72 °C for 2 min, with a final extension of 72 °C for 7 min (modified from Ward et al., 2005). Amplification products were purified with 20% polyethylene glycol (PEG). Sequencing was performed in two reads, forward and reverse, with the Big DyeTM Terminator v 3.1 Cycle Sequencing Ready Reaction kit (Applied Biosystems) with the additional primer M-13 and performed on a Model ABI 3130 sequencer (Applied Biosystems).
Data analysis. The electropherograms were analyzed and edited, and consensus sequences were generated in Geneious 9.1.8 (Kearse et al., 2012), followed by a visual inspection for final adjustments. We use NCBI BLASTn and BOLD Identification System (IDS) tools under the Barcode of Life Data System (BOLD) Species Level Barcode Records option to compare and confirm the prior morphological identifications. These sequences were posteriorly aligned using the ClustalW method in Geneious (Kearse et al., 2012). The final alignment was submitted to phylogenetic analysis under Bayesian Inference (BI) and Maximum Likelihood (ML) criteria, as well as the Neighbor Joining (NJ) distance method with the K2P evolutionary model, following Barcode of Life analytical procedures (www.boldsystems.org). To visualize putative differences between species, we calculated intra-and interspecific distances, using complete deletion with the evolutionary model K2P in MEGA6 software (Tamura et al., 2013).
For the BI and ML analyses, the best fitting model of nucleotide substitution was HKY+I, according to the Akaike Information Criterion in PartitionFinder2 (Lanfear et al., 2017). ML analysis was performed in RAxML v.8 and branch support was obtained with 1000 bootstrap pseudoreplicates (Stamatakis et al., 2014). BI analysis was conducted through BEAST 1.8.4 software (Drummond, Rambaut, 2007) to generate ultrametric trees with the Yule Speciation prior model. The tree search was produced by 20 million generations, sampling at each 2 thousand generations. The convergence of Markov chains was inspected in Tracer 1. 6 (Drummond et al., 2012). All values of Effective Sample Size (ESS) were above 200. We applied a 10% burn-in and used the remaining trees to obtain a Maximum Clade Credibility tree. Branch support was based on the posterior probability values. The trees were visualized in FigTree v1.4.1 (Rambaut, 2009).

Identification key of Engraulidae species from the lower San Francisco River
Engraulidae species are characterized by a fusiform small to moderate-sized body (up to 200 mm SL) laterally compressed; cycloid scales present on body and base of caudal and anal fins; body coloration brownish to pale, with a conspicuous silver longitudinal stripe present in most analyzed specimens; eyes laterally positioned in head; mouth subterminal, its posterior margin almost reaching posterior margin of opercle; teeth if present, canine or conical-shaped in a single row in the upper and lower maxilla; caudal fin forked. Geometric morphometrics. The overall morphological variation in the Principal Component Analysis revealed 61% of the variance explained by the first two components (PC1 = 42% and PC2 = 19%). Cetengraulis edentulus, Anchovia sp. were unambiguously distinguished, while the other taxa Anchoviella lepidentostole, Anchoviella sp. and Lycengraulis grossidens showed partially overlapping distributions in the morphospace (Fig. 4). Procrustes distances were significant among all analyzed species (p-values <0.001; Tab. 1). In turn, the Canonical Variation Analysis (CVA) effectively discriminated the body shape of all representatives in six completely different taxa. The two axis CVA1 (47%) and CVA2 (27%) explained 74% of all morphological variation found (Fig. 5). The largest vector variations are related to body (landmarks 3, 6, 7, and 8; Fig. 2) and caudal peduncle depth (landmarks 4 and 5; Fig. 2).
Genetic analysis. The final alignment consisted of 596 bp with 423 conservative and 173 variable sites. The K2P distances revealed the highest genetic distance  between L. grossidens and C. edentulus (17.9%) and the lowest distance between L. grossidens and A. cayennensis (13.7%) (Tab. 1). The molecular identification methods 'Blastn and Species Level Barcode Records' identified Anchoviella sp. and Anchovia sp., previously unidentified at the species level by traditional morphology as Anchoviella brevirostris (Günther, 1868) and Anchovia clupeoides (Swainson, 1839), respectively. The phylogenetic analyses (Fig. 6) resulted in a well-supported tree (bootstrap = 100, and 10/17 ni.bio.br | scielo.br/ni posterior probabilities PP = 1) for most species clade, discriminating seven independent lineages among all specimens analyzed. The group called pilombetas is represented by two clades: clade I (Anchoviella brevirostris + Anchovia clupeoides and C. edentulus) and clade II (A. lepidentostole, A. cayennensis and L. grossidens). Moreover, these analyses also revealed the presence of two divergent lineages within the population of Lycengraulis grossidens from São Francisco River estuary (1.7% of genetic distance). These results were consistent among the three species delimitation methods (GMYC, bPTP, and ABGD; Fig. 6).

DISCUSSION
Our integrative results revealed that seven lineages occurring in the lower São Francisco are currently being called pilombetas. The analysis of meristic characters along with the prior available taxonomic keys effectively discriminated four species (A. lepidentostole, A. cayennensis, C. edentulus, and L. grossidens). The diagnostic characters to identify these taxa were mostly related to the number of anal-fin rays and gill rakers on the upper and lower branches of the first gill arch. For the two taxa initially identified only at genus level, Anchoviella sp. and Anchovia sp., dorsal and pectoral-fin ray counts were useful to the identification, although the remaining characters overlapped, which hindered a specific identification of these two putative taxa. Recently, Loeb, Figueiredo (2014), analyzed samples of Anchoviella from the São Francisco and Amazonas basins, and reported that the overlap of meristic characters for this family hampers the identification by traditional morphology, which requires great experience in studies with the group. In fact, the great morphological similarity between these commercially important species hinders their reliable identification based on external characteristics and, consequently, appropriate control and management actions for these taxa (Bakar et al., 2018). The indiscriminate fishing of several taxa under the same common name may be unsustainably affecting the stocks of each of these species, which generates a worrying perspective, especially in estuaries, which are largely influenced by anthropic actions (Ferreira, Lacerda, 2016), and has a central role as cradles of diversity (Roque et al., 2016).
Geometric morphometrics (GM) analyses have long been used as a tool to identify species and fish stocks (Cadrin, 2000;Silva et al., 2003). In our GM analysis, the CVA was consistent with the traditional morphometry showing two more taxa, Anchoviella sp., and Anchovia sp., posteriorly identified by molecular barcode analysis as Anchoviella brevirostris and Anchovia clupeoides respectively, which substantially segregated in the morphospace visualization in comparison to the other species. Divergence by the amplitude of dorsal profile variation in axis 1 (CVA1) and caudal peduncle in axis 2 (CVA2) seems to follow the patterns commonly found in fish (Perazzo et al., 2018). Duarte et al. (2017), using GM to evaluate the diversity of Carangidae of the coast of Rio de Janeiro in Brazil, were able to identify the occurrence of three species that were previously identified as a single taxon. Barreto et al. (2016), evaluating the species of Nematocharax Weitzman, Menezes & Britski, 1986 in seven locations in the basins of the Bahia and Minas Gerais states in Brazil, found variations in the shape and body size of specimens from allopatric populations, proposing a putative new species for the group, which was formally described (Barreto et al., 2018), attesting the power of geometric morphometrics in distinguishing morphological differences at both intraspecific and interspecific levels. In the case of the present study, a fundamental difference in the fact that the questionable species studied are formally recognized as valid and even belong to different genera. However, the exploration of these groups by fishermen under the common name "pilombetas" reinforces the urgency of the need for effective alternative methodologies in identifying and controlling the use of fish stocks.
DNA barcoding, over the last decades, has been consolidated as the main tool to help classical systematics, especially in taxonomically controversial species that are difficult to identify by traditional methods (Carvalho et al., 2011;Durand et al., 2017).
The present results add to the growing number of studies that attest this effectiveness. In fact, all Engraulidae species identified based only in morphological analyses were confirmed by the molecular results of cox I. Additionally, records in the BOLD and NCBI databases revealed that species of Anchoviella and Anchovia that had not been identified at the species level are Anchoviella brevirostris and Anchovia clupeoides. Thus, it seems evident that building molecular inventories of regional ichthyofaunas (e.g., Brandão et al., 2016) and implementing DNA barcoding by control bodies can increase the effectiveness of fish surveillance. Ardura et al. (2010) studying species of commercial importance in the northern region of Brazil, used the DNA barcoding to differentiate taxa popularly known and marketed as Acarás, which later revealed to be a complex composed of seven species of Cichlidae. In another study, Bakar et al. (2018) analyzed juvenile specimens morphologically discriminated as Lutjanus lutjanus Bloch, 1790 commercially captured in Malaysia and found two genetically distinct and well-supported monophyletic lineages, reaffirming the power of this technique to discriminate cryptic taxa not visualized by traditional methods where diversity may be underestimated. DNA barcoding for fish identification has been shown to be highly effective in distinguishing fish species from various locations (Ward et al., 2005;Ardura et al., 2010;Pereira et al., 2013).
The species delimitation methods GMYC, bPTP and ABGD showed concordant results, suggesting that the Engraulidae of the lower São Francisco encompass seven lineages. Two distinct lineages within L. grossidens from the São Francisco River estuary indicate a cryptic species, which the divergence between them represents 1.7% of genetic distance, and they were not discriminated by morphological characters and morphometric analyses. Previous studies have shown the ecological plasticity and genetic diversity of this species in entering rivers and estuaries along the Brazilian coast (Mai et al., 2016). In this context, future phylogeographic analysis will be able to assess in more detail the evolutionary routes of the L. grossidens populations. In fishing statistics, a crucial problem is the lack of precise identification of fish stocks, especially in Brazil, where common names are widely used. This is an important requirement to correctly estimate trends in productivity, biomass and population dynamics, which in turn helps to define appropriate management strategies. To confuse several species in an evaluation would be equivalent to ignoring the different population dynamics of each individual taxon and their different life histories (Karahan et al., 2014). In the case of fish that are commercially exploited, it is important that they are correctly labeled in research, and mainly in the commercial market (Karahan et al., 2014), as it should happen in the case of pilombetas, considering that they are an important resource of the fishing communities of the lower São Francisco River and in other regions of Brazil (Paiva-Filho et al., 1986;Ibama, 2006;Sampaio, et al., 2015).
The present results revealed that the diversity of species living in the lower São Francisco River called pilombetas has been underestimated. In addition, it demonstrated that combined analyses using morphological (traditional taxonomy and geometric morphometrics) and molecular data (DNA barcoding) can be used to assess species richness and genetic diversity, securing these finite resources for its longterm exploration. Moreover, a new taxonomic key that can accurately assist in the Engraulidae identification for the area is presented. We recommend the development of regional identification keys for other fish groups with similar taxonomic problems, 13/17 ni.bio.br | scielo.br/ni which can assist in this process of identification and improvement of fishing reports. This information is expected to contribute effectively to the proper management of these fish stocks in the lower São Francisco River, as well as in other regions along the Brazilian coast.