A scientometric study of the molecular data on Brazilian red algae

ABSTRACT The growth in the number of molecular sequences produced in recent years has provided valuable contributions to the knowledge diversity of red algae (Rhodophyta) in Brazil. However, many works indicate that there is still much to be studied. Here, we highlight the importance of compiling all these data to evidence possible taxonomic and geographic knowledge gaps in red algae through molecular data. So, this study presents a scientometric analysis of the molecular studies carried out on red algae in Brazil, based on the analysis of DNA sequences available in the public database GenBank. Our results demonstrated an increase in published papers from 1994 to 2020, corresponding to 165 studies analyzed. A huge disparity in the production of knowledge was observed, with the southeastern region concentrating the researchers responsible for most of the published studies, with emphasis on those in the University of São Paulo (USP). Bahia and São Paulo states stand out with the highest observed taxonomic richness, respectively. We also could identify the taxonomic groups with more molecular data generated, the sequence number deposited per institution since 1994, the main markers used, the regions that require a greater collection effort, and where the researchers who study red algae in the country are.


Introduction
Red algae (Rhodophyta) are the most diverse group of marine macroalgae (Stiger-Pouvreau & Zubia 2020), with 7,351 described species (Guiry & Guiry 2022).Of these, approximately 550 occur in Brazil (Flora e Funga do Brasil 2022).However, the dozens of recently described species, including cryptic species, in addition to the constant increase in the report of new occurrences (Menezes et al. 2015; Flora e Funga do Brasil 2022), indicate that we are far from knowing the entirety of this phycoflora and that the Brazilian territory still has much to be studied.Molecular data have been essential for the taxonomy and systematics of groups that are difficult to identify due to their relatively simple morphology and anatomy, convergence, and great phenotypic plasticity, such as red algae (Saunders 2005).
The increase in the number of collections efforts and the several studies carried out with DNA barcoding techniques helped to make more accurate identifications and also led to the correction of erroneous citations of taxa that do not occur on the Brazilian coast (Menezes et al. 2015).
All these efforts have generated exponential growth in the molecular sequence's number, providing valuable contributions to the knowledge of Rhodophyta diversity in Brazil in recent years.GenBank (https://www.ncbi.nlm.nih.gov/genbank/) is considered the main source of sequence data for many taxa (including red algae), being an annotated collection of all publicly available DNA sequences (Benson et al. 2013).As a condition of publication, the majority of scientific journals require authors with DNA sequence data to submit these to a public database, providing a means for other researchers to retrieve the data (Sayers et al. 2019).However, some researchers have discussed the quality and management of sequences deposited in public databases, indicating the need for studies to verify the reliability and detail of the information (Jin et al. 2020).In addition, it is still necessary to compile such data so that it is possible to evidence and contribute to the closing of taxonomic and geographic gaps in the knowledge of algae through molecular data.
In this context, scientometrics studies are relevant tools from which it is possible to quantify scientific literature data in a particular field (Hood & Wilson 2001;Coelho et al. 2014), to demonstrate current trends and to detect knowledge gaps, as well as to provide guidelines and motivations for future research in specific fields or areas (Coelho et al. 2014;Oliveira et al. 2020).Regarding the algal research, some studies have focused on themes such as genomics (Konur 2020a), algal structures (Konur 2020b), and macroalgal biomass (Coelho et al. 2014), or in specific groups such as macrophytes (Padial et al. 2008), seaweeds (Mohan & Ravi 2007;Kumaresan et al. 2015), phytoplankton (Nabout et al. 2015) and dinoflagellates (Noga & Gomes 2018;Oliveira et al. 2020).
The present study aimed to present a scientometric analysis of the molecular studies carried out on Rhodophyta sampled from Brazil, presenting an updated review based on the analysis of information associated with DNA sequences deposited in GenBank since 1994.Our results highlight the most studied taxonomic groups, the regions of the country with the highest concentration of studies and those that require a greater collection effort, the number of sequences deposited per institution over the years, and where researchers studying red algae in the country are.

Materials and methods
The present study was carried out between February 2021 to April 2022.We analyzed data from GenBank (https:// www.ncbi.nlm.nih.gov/genbank/)from 1994, when the first sequences of Brazilian red algae were deposited, until 2020.
The search for data was performed in an automated way, using a routine in R, built using the packages reutils (Schöfl 2016) and rentrez (Winter 2017).For formatting the data obtained, the dplyr package (Wickham et al. 2020) was used.The search was performed in the "Nucleotide" database, using the following parameters: "Rhodophyta" [Organism], "Rhodophyta" [All Fields]) and "Brazil" [All Fields]".A Python code was used to view the following information: GenBank identification code; species; responsible for the submission [authors and institutions]; title and year of publication of the article; collection information and voucher [sample herbarium number].
The extracted information was tabulated and the following attributes were analyzed: number of papers on molecular taxonomy of red algae published in Brazil so far; number of sequences deposited in public databases per taxonomic family; number of researchers responsible for depositing sequences; molecular markers used in the studies; main research institutions involved and region; regions of the country with the highest and lowest concentration of studies; number of sequences deposited by institution over the years; more and less studied taxonomic groups.Missing data (those not completed by the researcher at the time of depositing the sequences) were filled in through direct analysis of each study published until December 2020.

Results
We found 3,735 sequences of red algae collected off the Brazilian country and deposited in GenBank.Such sequences were included in 165 molecular studies, 11 of which being listed as unpublished.The first sequences of Brazilian red algae were deposited on the GenBank in 1994 and, until the end of 2020, there was a trend towards an increase in the number of published molecular studies using sequences from specimens collected in Brazil (Fig. 1).The number of sequences increased between 2012 and 2019, especially with studies being led by researchers from teaching and research institutions in São Paulo.Of the twelve studies found in 2012, half was published focusing on the Gracilariaceae, all led by researchers from the Laboratory of Marine Algae Edison José de Paula (LAM), at the Botany Department of the Biosciences Institute of Universidade de São Paulo (USP).Regarding the 21 studies analyzed in 2019, ten were published by researchers from the Instituto de Pesquisas Ambientais (IPA), although there was no prevalence of any specific taxonomic group.
Our analysis revealed that 66% of the molecular studies on Brazilian red algae were published by researchers from Brazilian institutions, though a great contribution from international institutions researchers (34%), mainly in the USA, also has been perceived.We even observed that until 2009, most studies were led by researchers from international institutions (Fig. 2).After this period, it was Acta Botanica Brasilica, 2023, 37: e20220174 possible to notice an increase in the protagonism role of Brazilian institutions researchers, conducting molecular studies with specimens collected along the country.When analyzing the distribution of studies by geopolitical region, we found that 86 (77,5%) of the 111 studies were conducted by researchers in institutions from the southeastern region (Fig. 3).These result from phycologist's work in the institutions in São Paulo state: USP, IPA and Universidade Estadual Paulista (UNESP), with emphasis on USP, which published 40% of these studies (Fig. 4).
Observing the historical data series (1994 to 2020), it was possible to notice a growing trend in the number of red algal sequences from Brazilian specimens deposited in GenBank, with the highest numbers being observed in the years 2015, 2018 and 2019, respectively (Fig. 5).In total, 94 taxonomic groups and one sequence from unverified taxa were found.The most studied genera were Gracilaria and Hypnea, while the least studied were Sciurothamnion, Pterocladia, among others (Fig. 6).In 2015, there was a high rate of deposition of species sequences of Gracilaria (217), Gelidium (162) and Hypnea (138) genera.In 2018, 183 sequences from Amphiroa, 166 from Gracilariopsis, and 135 from Lithophyllum were found, while in 2019, only Hypnea stands out, with 225 sequences deposited.
Considering that a lot of incomplete information deposited on the GenBank platform was found (e.g., institutions responsible for depositing the sequences and collection sites), it was necessary to perform a search directly in the scientific papers and/or its supplementary materials to complement the analyzed data.Acta Botanica Brasilica, 2023, 37: e20220174   In general, there are a greater number of international institutions involved in the sequence deposit process (64%) than Brazilian institutions (36%).Despite this, national institutions deposited most of the Rhodophyta sequences collected in Brazil (95%), mainly through the University of São Paulo, which is responsible for 60% of the red algae sequence deposits in the country.
Regarding the number of sequences produced from specimens collected in Brazilian geopolitical regions, the Southeastern and Northeastern are the regions with the highest number of sequences deposited (Fig. 3).Among the Brazilian states, Bahia (16.5%) and São Paulo (16%) are the ones with the highest taxonomic richness, based on sequences available in the GenBank, followed by Rio de Janeiro and Espírito Santo, representing 13.1% and 12.2% of the sequenced red algae richness in Brazil, respectively (Fig. 7).Concerning the molecular markers used on molecular studies in Rhodophyta from Brazil, the mitochondrial COI-5P (or cox1) and the rbcL have been the most employed, followed by the plastid UPA and psbA and the mitochondrial cox2 and cox3.Table 1 presents the main markers that have been used in these studies.

Discussion
Red algae constitute a vast economic resource for humans, with different industries accounting for several billion dollars per year (Lopez-Bautista 2010).In addition to being used directly in humans (such as the famous Nori seaweed, present in sushi) and animal food, red algae are also used in the textile, pharmaceutical, cosmetics, biofuels, and biomaterials industries (FAO 2018).In Brazil, species of Rhodophyta are the seaweeds most harvested and/or cultivated for the manufacture of phycocolloids, such as agar and carrageenan (Simioni et al. 2019).Studies have shown they can help in the prevention of cancer and the treatment of skin diseases, in addition to having immunostimulant, photoprotective, antioxidant, antifungal, antiviral, and antibacterial properties, besides several other benefits (Rajasulochana & Preethy 2015).In the last decades, the increasing number of molecular studies has been crucial for Brazilian phycological research (Menezes et al. 2015), although scientific production on red algae through molecular data has not been evaluated.
Molecular taxonomy has been intensively adopted by phycologists, which led to a considerable increase of sequences in public repositories such as the GenBank, and made possible to carry out molecular-assisted biodiversity surveys of seaweeds in different regions (Robuchon et al. 2015).Here, we presented an analysis of the progression of the molecular studies on Brazilian red algae over 26 years (1994 to 2020), based on the information associated with sequences deposited on the GenBank platform.Our results demonstrated a non-constant increase trend in the number of papers, which seems to have been motivated by the training and qualification of Brazilian phycologists in molecular biology techniques, as well as by the investment in the infrastructure of Brazilian research institutions achieved in recent decades (Menezes et al. 2015;BFG 2018;2021;Gasper et al. 2020).
Since 2010, most molecular studies on red algae from Brazil have been conducted by researchers from Brazilian institutions.Previously, research conducted in international institutions predominated, especially in collaboration with US institutions.As well as in the phytoplankton research (Nabout et al. 2015), our data revealed that the USA has been Brazil's main collaborator in molecular work with red algae.It is worth noting that, at the peak of scientific production recorded in 2019, all 21 studies were carried out in public universities and research centers in Brazil, with no records of publications led by researchers in foreign institutions.On the other hand, we observed a drastic drop in the number of papers led by authors in Brazilian institutions in the following year, certainly as a result of the financial limitations that the country faces, as well as the health restrictions of the COVID-19 pandemic.
By analyzing the geopolitical regions of Brazil where the researchers are located (Fig. 3), there was a huge disparity in the production of knowledge about the red algae along the Brazilian territory, with 83% of the studies having been conducted in institutions of the southeastern region.Of these, USP alone contributes 40% of the total.It is noteworthy that, of the thirteen institutions responsible for knowledge in molecular biology of red algae, only four universities from northeastern appear on the list (Fig. 4).Even so, northeastern Brazil is the second most productive region, counting on 12% of the publications.This asymmetry was not so pronounced when we analyzed the number of produced sequences by region, with the southeastern accounting for 52.2% of the sequences, followed by the northeastern with 39%.
These findings are in line with the previous information, demonstrating that the regions where there is a greater number of papers/sequences are not necessarily the regions in which the institutions responsible for such studies are located.This is due to the great concentration of researchers, institutions, universities, and botanical collections in southeastern Brazil (Nabout et al. 2015;Noga & Gomes 2018;Gasper et al. 2020;Mcmanus et al. 2021) with infrastructure and funding to carry out the stages of molecular biology and bioinformatics work.
During the development of this work, there were difficulties in finding correct information associated with sequences, given that some data were unavailable or incomplete on the GenBank platform, namely: authors, institutions, and sampling location.Of the 165 publications, 11 (6.6 %) were categorized as unpublished, indicating that the sequences were deposited in the database by the responsible researcher and released to the public without a bibliographic reference being attributed to them.Of the 3,735 sequences analyzed, 219 (5.8%) did not have data on specific collection sites.Only one taxon was flagged as unverified, which is when the accuracy of the submitted sequence data or annotations cannot be confirmed (Benson et al. 2013).
Acta Botanica Brasilica, 2023, 37: e20220174 Concerning the number of sequences produced, we found two main peaks, one in 2015 and another between 2017 and 2019.After analysis, we found peaks were related to the prevalence of specific groups of Rhodophyta, mainly on the occasion of DNA barcoding approaches, where a relatively large number of sequences per specimen is usually necessary to reliably access intra and interspecific variations (Saunders & Kucera 2010;Leliaert et al. 2014;Jesus et al. 2016, Porter & Hajibabaei 2018).The most studied genera were Gracilaria (Bellorin et al. 2002;Gurgel & Fredericq 2004;Costa et al. 2012;Soares et al. 2015;2018;Lyra et al. 2015a;b;2016;2021;Iha et al. 2018;Ayres-Ostrock et al. 2019) and Hypnea (Nauer et al. 2014;2015;2016;2018;2019a;b;2020;Jesus et al. 2015;2016;2019a;b) (Fig. 6).This data coincides with the training of specialists in systematic and taxonomic studies of red algae in the last decade, in the institutions from southeastern (USP and IPA) and northeastern (Universidade Federal da Bahia, UFBA).As stated by Menezes et al. (2015), our findings ratify that the production of knowledge in the area was concentrated by a couple of researchers in very few institutions.
Species richness differed profoundly across Brazilian territory, with Bahia and São Paulo states concentrating the greater number of species, followed by Espírito Santo and Rio de Janeiro (Fig. 7).However, it should be mentioned that these data are more related to the areas oversampling for molecular studies than red algae species richness in Brazil.Our results show an increase in knowledge about red algal biodiversity in northeastern Brazil in recent years, which has begun to close the gap between better-studied and lesser-known regions (Bicudo & Menezes 2010).The minor richness was observed in continental areas, which was expected, considering the great majority of Rhodophyta species are marine (Guiry & Guiry 2022).
Given the prevalence of DNA barcoding studies on Brazilian red algae, the mitochondrial cytochrome oxidase I locus (COI or cox1 -see Table 1) has been the main molecular marker chosen.Besides being suitable for species identification, delimitation, and discovery (Leliart et al. 2014), this marker also has been used in phylogeographic studies on Rhodophyta from Brazil (e.g.Zuccarello & West 2002;Paiano & Necchi Jr 2013;2017a;b;Ayres-Ostrock et al. 2019).The top second marker is the plastidial rbcL, which has been employed together with other markers such as the COI in systematic studies to resolve taxonomic issues (e.g.Lyra et al. 2015a;b;Jesus et al. 2019a;b).
GenBank was created in 1982 (Benson et al. 2013) and is considered the largest repository of genetic data for biodiversity (Porter & Hajibabaei 2018).This study outlines an overview of the molecular studies in Brazilian red algae between 1994 and 2020 based on sequence data from GenBank.Although it is possible that not all Rhodophyta sequences from Brazil are linked to this database, especially the older ones, through our data was possible to analyze: the progression of research in the country over the years; identify the taxonomic groups with more molecular data generated; the regions of the country with the highest concentration of studies; the main institutions responsible for the deposit of sequences; as well as where are the researchers who study the Rhodophyta from Brazil.Some questions remain unresolved, with several gaps and some biases observed.A review of the information associated with some sequences there would be necessary, as well as greater inspection of future deposits, which would facilitate the development of taxonomic research on red algae in Brazil, contributing to the scientific and technological advancement of molecular studies in algae as a whole.Furthermore, our findings expose the disparities concerning the production of knowledge in Brazil in an area that depends on large investments in infrastructure and financing, being needed incentives to reduce differences and allow scientific advancement in the country.

Figure 1 .
Figure 1.Number of publications on Brazilian red algae between 1994 and 2020 based on molecular data from GenBank.

Figure 2 .
Figure 2. Number of researchers over the years linked to national and international institutions.

Figure 3 .
Figure 3. Number of papers and sequences by geopolitical region of Brazil.

Figure 5 .
Figure 5. Number of sequences deposited per year.

Figure 6 .
Figure 6.Total number of sequences produced by genus of red algae from Brazil.

Figure 7 .
Figure 7. Species richness of red algae distributed by the Brazilian state.

Table 1 .
Main molecular markers used in molecular studies in Brazilian red algae.