Saint Peter and Saint Paul Archipelago barcoded: sh diversity in the remoteness and DNA barcodes reference library for metabarcoding monitoring

Anthropogenic pressures have been depleting the global biodiversity. In order to monitor the changes in ecosystems, molecular techniques can be used to characterize species composition. Among molecular markers capable of identifying species, the COI is the most used, and its sequencing is the standard procedure of how taxonomic information can be surveyed. Despite this, new possibilities of biodiversity proling have become possible through the assessment of highly fragmented DNA molecules in environmental samples. Now, medium- and short-length markers are used in metabarcoding studies. Here, a survey of marine sh from the Saint Peter and Saint Paul Archipelago was barcoded, in which the COI barcode procedure identied 21 species of 11 families of sh. Then, the rst extensive COI library of these islands located in isolation was constructed; from these sequences, the most appropriate primer pair for future metabarcoding studies was identied. The new Saint Peter and Saint Paul sequence database has 9,183 sequences from 165 species and 62 families of sh. The overall mean distance among all sequences was 0.4. This distance reveals that the archipelago is a reservoir of biodiversity as this attribute is higher than other islands around the world. Due to this, the protection of the archipelago should be enhanced and well monitored with science-based approaches such as DNA metabarcoding. In this case, the primer pair specically designed from this library should be considered.


Introduction
Impacts of human-induced climate change, habitat fragmentation, and over-exploitation of natural resources have dramatically depleting global biodiversity, in special the marine environment (Díaz et al. 2006;Butchart et al. 2010;Pinsky et al. 2019). Conservation efforts based on robust biomonitoring programs are necessary to identify and mitigate ecological issues (Stat et al. 2017;Berry et al. 2019); therefore, life preservation diversity depends on species classi cation accuracy (Thomsen & Willerslev 2015;Lin et al. 2020). The species composition and distribution can act as an environmental barometer of human activity (DiBattista et al. 2020).
Considering the dominant in uence on mass extinctions that these activities have been occasioning, it was considered impossible to describe the true magnitude of the loss with traditional taxonomic approaches (Blaxter 2003;Hubert & Hanner 2015); hence, molecular techniques have been developed to characterize species diversity quickly and reliably (Krishna Krishnamurthy & Francis 2012;Elbrecht et al. 2019). The eld was revolutionized when Hebert and collaborators (2003) proposed that standardized gene regions (DNA barcodes) could be used to identify and discriminate species (Hebert et al. 2003;Hebert, & Gregory 2005).
In detail, the mitochondrial gene cytochrome c oxidase I (COI) is a key element in aerobic metabolism, thereby it is present in all animal species (Hebert et al. 2003). This 658-bp genetic fragment can be easily obtained from animal tissues, and once sequenced, it provides greater than 97% con dence for differentiating species by the divergence in their COI sequences (Hajibabaei et al. 2005;Meusnier et al. 2008). After nearly two decades, the method has been widely accepted as the standard procedure for surveying biodiversity (Hubert & Hanner 2015;Delrieu-Trottin et al. 2019).
During the same period of the DNA barcoding emergence, genetic sequencing tools had evolved at unprecedented speed . Hence, modern high-throughput sequencing platforms have been made possible the quickly acquisition of large amounts of sequence data (Reuter, Spacek & Snyder 2015). Conjointly, these technologies paved the way for a new eld in biodiversity assessment, and now ecologists have the necessary tools to analyze the taxonomic composition of environmental samples Creer et al. 2016).
The genetic assessment of multiple taxa from bulk environmental samples is denominated "DNA metabarcoding" (Taberlet et al. 2018); this approach is becoming a well-established tool for monitoring air, feces, sediment, soil, and water ecosystems (Creer et al. 2016;Jarman et al. 2018). With the aid of advanced molecular technologies, sensible DNA extraction protocols, and a massive parallel sequencing on Next-Generation sequencers, the DNA molecules expelled by organisms through urine, reproductive or digestive materials, hair, skin, tissues, and even from dead individuals in the environment are now able to be studied (Thomsen & Willerslev 2015;Wangensteen et al. 2018).
However, the genetic material extracted from ecosystems is highly fragmented (Deagle, Eveson & Jarman 2006); to this extent, it may be challenging in practice to retrieve the community COI barcodes genes from marine environmental sources (Meusnier et al. 2008). Metabarcoding analyses are contingent on target shorter (usually < 300 base pairs) DNA regions than the traditionally de ned barcoding regions (Yu et al. 2012;Clarke et al. 2014;Thomsen & Willerslev 2015).
In this context, alternative target metabarcoding markers (metabarcodes) have been developed to obtain biodiversity information in short-length PCR products (Taberlet et al. 2018). The most promising metabarcodes for meiofaunal characterizations are the mitochondrial 12S or 16S rRNA genes (Epp et al. 2012). This is because the rRNA region folds to form a hairpin structure, so the respective mitochondrial region necessary for the translation is highly conserved, making the region a reliable method for the taxonomic assignment (Clarke et al. 2014;Yang et al. 2014).
Another metabarcode option is the much shorter sequence (130 bp) of > 650 bp mitochondrial cytochrome c oxidase 1 (COI). The "mini-barcode" was tested in alignment with the full-length COI sequences. The results suggested that the region provides e cient taxonomic identi cation success, and their use was proposed to analyze environmental mixtures (Meusnier et al. 2008). However, the full core sequences cannot be retrieved by direct sequencing even when DNA targets are successfully ampli ed, which weakened the species identi cation capability (Sultana et al. 2018).
To overcome this limitation, medium-sized (> 320 bp) barcodes were developed to identify sh species (Shokralla 2015;Günther et al. 2017;Sultana et al. 2018: Collins et al. 2019. For example, the use of medium-sized barcodes has demonstrated the capability to identify sh species even in processed forms and marine metabarcoding analyses (Shokralla 2015;Collins et al. 2019). Despite the successful use of these markers in sh biodiversity assessment via metabarcoding (McClenaghan et al. 2020;Singer et al. 2019;Russo et al. 2021), biodiversity assessments could be maximized by the use of regional-speci c reference barcode libraries (Lin et al. 2020).
Customized libraries have greater importance in the case of marine shes, because they have the role of bioindicators (Ribeiro et al. 2012;Brandão et al. 2016;Delrieu-Trottin 2019). To this extent, considering differences in marine ecosystems and the various habitat requirements for fauna survival, the assessment of sh richness, abundance, and diversity is a crucial indicator of the ecological characteristics of the region (Chovanec, Hofer & Schiemer 2003).
In particular, oceanic islands are biogeographic regions in which species have evolved in isolation (Emerson 2002). This remoteness in uence has shaped islands as an important reservoir of biological diversity and a refuge for many endemic species (Losos & Ricklefs 2009;Shaw & Gillespie 2016). For instance, physical, biological, and chemical features affect the sh assemblage structure (Andrades et al. 2018). In fact, the high level of sh endemism may claim the urgency in Brazil's island conservation (Andrades et al. 2018).
Saint Peter and Saint Paul Archipelago (SPSPA) is a small group of plutonic rocks uplifted from the upper mantle of the earth, located in the central equatorial Atlantic Ocean ( Fig. 1) (Campos 2005). The archipelago is an important migratory, breeding, and feeding site for shes (Mendonça et al. 2018). Also, its isolation spawned the evolution of the unique biodiversity of shes, with a variety of color morphs and genetically divergent lineages (Pinheiro et al. 2020).
Fish biodiversity of SPSPA has been studied since Lubbock and Edwards (1981) listed 50 sh species. The authors surprisingly considered the species diversity the lowest of any tropical island studied to date. Following the inauguration of the rst Archipelago's scienti c station in 1998, scuba expeditions were made possible (Viana et al. 2009). Then, the number of registered species increased to 75 (Feitoza et al. 2003), to 116 (Vaske Jr et al. 2005), and most recently, to 225 species (Pinheiro et al. 2020). Contrary to the Lubbock and Edwards' (1981) considerations, the last survey pointed to the archipelago as the thirdhighest level of endemism in the Atlantic (Pinheiro et al. 2020). Among the 225 listed species, 112 are pelagic, 86 are shallow, and 27 are deep reef shore shes. The inventory classi cation consists of 202 Teleostei distributed in 16 orders and 23 Elasmobranchii in 6 orders (Pinheiro et al. 2020).
Although remarkable for its biodiversity, the archipelago is not immune to ecological impacts from human activities. The massive shing activity in their surroundings makes Saint Peter and Saint Paul's species extremely vulnerable, making the area the apical priority for biological study and conservation (Viana et al. 2015). In order to characterize the baselines of Saint Peter and Saint Paul's sh biodiversity, a survey of captured shes has been genetically barcoded with COI. This study aims to provide the rst extensive COI library of marine sh from SPSPA. Based on sequences of surveyed species and the ones listed in Pinheiro et al. (2020) retrieved from BOLD, the most appropriate primer pair for future metabarcoding studies have pointed.

Materials And Methods
The eld expeditions were conducted between 2004 and 2015 in surroundings of the Saint Peter and Saint Paul Archipelago (000° 55ʼ N and 029° 21ʼ W). Fishes were randomly caught by an authorized sherman (license number SISBIO/ICMBio 014/2005). Tissue fragments were labeled (numbered) and preserved in 96% ethanol at − 20°C until their extraction.
DNA was extracted using the PureLink™ Genomic DNA Mini Kit (Thermo Fisher Scienti c) following the manufacturer's protocol. The forward and reverse "Fish" primer pair (Ward et al. 2005) was used to amplify the cytochrome c oxidase subunit i (COI) gene by polymerase chain reaction (PCR). Each PCR was conducted in a total volume of 25 µL reaction mix consisting of 0.2 mM of dNTPs, buffer 1×, 1.5 mM of MgCl2, 0.2 µM of each primer, 1 U of Taq polymerase, 50-100 ng of template DNA, and ultrapure water to a nal volume. The thermal cycling condition began with an initial denaturing at 94°C for 5 minutes, followed by 35 repeated cycles of denaturing (94°C for 0.5 minutes), annealing (50°C for 0.5 minutes) and extension (72°C for 1 minute), then concluded with a nal extension at 72°C for 7 minutes. The size and speci city of ampli cation products were con rmed in 1% agarose stained with GelRed (Biotium, Fremont, California). The successful products were puri ed using exonuclease I and Shrimp Alkaline Phosphatase enzymes. Finally, they were sequenced by the Sanger method on an ABI3730XL DNA sequencer (Thermo Fischer Scienti c, Massachusetts, United States), with the forward primer used for ampli cation.
The generated sequences were edited in Geneious Pro version 9 software and aligned using ClustalW (Edgar 2004) in Geneious software. Species were identi ed using the "Identi cation Engine" of the Barcode of Life Data System (BOLD). The taxonomic identity of each sequence was assigned to the deposited sequence with the highest similarity score. Also, a neighbor-joining tree was constructed based on the aligned dataset using the Kimura 2-Parameter (K2P) model (Kimura 1980) with 1,000 bootstrap replicates and pairwise deletion in Geneious to estimate the phylogenetic relationship between species. A new primer pair exclusively curated for this database was designed in Primer3 software. The performance of the newly designed primers was tested in silico against Saint Peter and Saint Paul sh sequences repository for future metabarcoding studies.
All graphics and gures were created and edited using Adobe Illustrator.

Results
The extraction and ampli cation methods were successful for 26 of 28 samples. Among the 26 samples, the COI Barcode could be identi ed on BOLD, with a high percentage of similarity (98.04%-100%) ( Table  1), revealing 21 species that are found in 11 families of shes ( Figure 2). Among the 21 species of sh, Canthidermis maculate was the most abundant (3 of the samples), followed by Acanthocybium solandri, Xiphias gladius, and Prionace glauca (two samples each).

Discussion
Biodiversity of SPSPA: Along the oor of the Atlantic Ocean are located the largest geological features on the planet. An immensely long mountain range developed due to the divergent motion between continental plates, the Mid-Atlantic Ridge (Searle 2013;UNESCO 2021). Situated between Brazil and the African continent, the Saint Peter and Saint Paul Archipelago is a rare non-volcanic formation resulting from the Mid-Atlantic Ridge's exhumed mantle rocks (Mohriak 2020). As a consequence of unique geological traits, along with latitude, weather, marine currents, and biogeographic features the biodiversity of the SPSPA is commensurately singular.
As expected from the theory of island biogeography, the site represents an important reservoir of biological diversity and a refuge for many endemic species that have diversi ed on these islands through time (MacArthur & Wilson 1967;Pinheiro et al. 2017). However, it seems that the biodiversity of SPSPA is even more remarkable, as the average K2P distance of individuals within species found in this study was higher than other islands around the world (Ward et al. 2005;Rock et al. 2008;Zhang & Hanner 2011;Steinke et al. 2017;Bingpeng et al. 2018;Xu et al. 2019).
Naturally, the isolation has been played a crucial role in the genetic diversity and endemism of the smallest remote tropical island in the world (Luiz et al. 2015). Aside from the distance, it was the seamounts that may have played an essential function in the marine evolution of the SPSPA, the site (as a peak of the mountain range) acted as a "stepping stone" for shes during successive periods of sealevel changes (Ludt & Rocha 2015;Dias et al. 2019). Also, it is the SPSPA topography and strategic location that have been guaranteeing the rocks as an important feeding and reproduction ground for several migratory pelagic species, mostly with high commercial value, such as the yellow n tuna (Thunnus albacares); the wahoo (Acanthocybium solandri); the rainbow runner (Elagatis bipinnulata); the ying sh (Cheilopogon cyanopterus); whale shark (Rhincodon typus); silky shark (Carcharhinus falciformis); and Galapagos shark (Carcharhinus galapagensis) (Hazin et al. 2008;Viana et al. 2015;Pimentel et al. 2020). Due to the heterogeneity of migrants and residents of the region, molecular techniques are a useful tool to catalogue and uncover the biodiversity of SPSPA. In this study, the authors successfully ampli ed the COI barcode sequences for Saint Peter and Saint Paul Archipelago shes. The surveyed site is a remote and protected oceanic island (Soares & Lucas 2018). Naturally, the sample size is limited; for this reason the samples of this study were opportunistically collected over different expeditions. Aside from the sampling challenges, the COI barcoding gene of 26 sh species were successfully ampli ed. The differentiation between species through individual COI barcodes validates the e ciency of COI barcodes for identifying marine sh species.
Among the sequences, the AT content (56.30%) was higher than the GC content (43.70%), which is congruent with the constructed SPSPA sh database (AT content: 55.70%) and other COI barcoding of marine sh studies (Ward et al. 2005;Steinke et al. 2009;Mecklenburg et al. 2010;Zhang & Hanner 2011;WS et al. 2011;Costa et al. 2012;Ribeiro et al. 2012;Knebelsberger et al. 2014;Bingpeng et al. 2018;Limmon et al. 2020;Ghouri et al. 2020;Ahmed et al. 2021). Also, the constructed Neighbor-Joining tree clustered with coherency closely related species under the same nodes, while dissimilar species were clustered under separate nodes. The identi cation of surveyed species via COI barcoding was valuable not only to bio-scan the shes of Saint Peter and Saint Paul, but also to reveal an uncovered diversity of the site.
New species records for the site: The database of the sh biodiversity of SPSPA constructed by Pinheiro et al (2020) includes data from expeditions of the shallow and deep reefs of SPSPA and a compilation of records from the literature from almost four decades of study. Apart from this extensive work, our survey opened up the possibility of uncovering the hidden biodiversity of the rocks.
The feasibility of new records for the region is sustained by the fact that the DNA barcoding revolution has hastened species discovery during the last 15 years (Cao et al. 2016;DeSalle & Goldstein 2019;Lopez-Vaamonde et al. 2021). In turn, efforts to collect and barcode sh species from speci c regions aided new sh records in other regions of the globe (Steinke et al. 2017;Ahmed et al. 2021).
The methodology applied in this study revealed four new records to Saint Peter and Saint Paul region: Cheilopogon atrisignis; Cheilopogon nigricans; Remora australis; and Thryssa chefuensis. Considering the natural history of these species, it is plausible that Cheilopogon nigricans and Remora australis inhabit the SPSPA, as their distribution is described to be in the neighboring waters of the Atlantic Ocean (Fishbase 2021). Whileas, Cheilopogon atrisignis and Thryssa chefuensis are related to the Indian and Paci c oceans respectively (Fishbase 2021). Additional morphometric approaches must be applied in order to con rm the presence of these four species in SPSPA, by any means this shows there is yet diversity to be discovered in this portion of the ocean and their protection ought to be ensured.

Future monitoring and conservation considerations:
Due to the presence and connectivity of key species of corals, crustaceans, mollusks, shes, marine birds, and cetaceans SPSPA is protected by the Ministry of the Environment of Brazil since 1986. Despite the protection, commercial shing boats were allowed to operate in the SPSPA regularly (Viana et al. 2015). In 2018, the environmental protection of the islands and surroundings was increased by the Brazilian government (Brazil 2021). However, the vast majority of the new areas is "Area of Sustainable Use", where "subsistence" sheries are allowed speci cally in its management plan; it turned out not being subsistence shery and culminated in truly commercial shing and industrial activities from regional shing companies as reported by Giglio et al. (2018). Furthermore, the habitats considered more vulnerable with high environmental impact have not received integral protection. The areas of integral protection were designated in places where these activities are already unlikely or rare (Magris & Pressey 2018).
Depth knowledge and studies are crucial to de ne boundaries and to set goals for Marine Protected Areas. Therefore, systematic data collection along time and space is necessary to understand the protected ecosystem better and promote possible zoning changes. Considering the richness of SPSPA biodiversity and their lack of protection, advanced genetics tools for monitoring ecosystems are needed. In this case, DNA metabarcoding of marine water has the potential to effectively monitor and give solid periodic information to managers and policymakers (Gold et al. 2021).
With special regard to SPSPA shes, this study provided a reliable DNA barcodes reference library for future metabarcoding identi cations. Also, by analyzing all 9,183 sequences, a primer pair able to amplify a short region of DNA was pointed, as ideally stipulated for metabarcoding analyses (Taberlet et al. 2018). The indicated marker is capable to amplify most of the sh species on the range in that library. This is because shes are the largest group of vertebrates, and the teleost and elasmobranch species are evolutionarily distant, therefore their genetic ngerprints are dissimilar (Nelson et al. 2016). A cocktail of primers should be considered for a comprehensive metabarcoding study of the total sh biodiversity of the region (Collins et al. 2019).

Conclusion
The Saint Peter and Saint Paul Archipelago is a reservoir of biodiversity. The strategic location of the rocks is an important feeding and reproductive ground for a variety of migratory shes; likewise, it is a refuge to the third-highest sh endemism level in the Atlantic. The checklist of shes that live in shallow and deep waters has already elucidated these outstanding patterns (Pinheiro et al. 2020); as yet, the genetic signatures of SPSPA sh species have remained unknown. Thereupon, this research endeavored to barcode surveyed species of the site and catalog all deposited sequences of listed shes in the region. Then, it was discovered that SPSPA shes are even more distinct from other islands around the world. Due to this, the protection of the archipelago should be enhanced and well-monitored with science-based approaches. In this case, DNA metabarcoding is an emerging tool that could assist in safeguarding SPSPA fauna; therefore, the reference library and the primer pair speci cally designed to study the shes of these islands should be considered for future metabarcoding monitoring activities.