HERBase: A collection of understorey herb vegetation plots from Amazonia

ABSTRACT Understorey herbs form a diverse and understudied plant assemblage in tropical forests. Although several studies and research teams have long been dedicated to the study of this conspicuous vegetation component in Amazonia, no effort to unify the data has been undertaken to date. In contrast to trees and other life forms for which major data compilations already exist, a unified database dedicated to herbs is still lacking. Part of the problem is in defining what is a herb and how to effectively sample herb assemblages. In this article, we describe the database HERBase, an exhaustive compilation of published and unpublished data on herb inventories in Amazonia. We also describe the structure, functioning, and guidelines for data curation and integration in HERBase. We were able to compile information from 1381 plots from all six Amazonian geographic regions. Based on this dataset, we describe and discuss sampling and knowledge gaps, priority areas for new collections, and recommend sampling protocols to facilitate data integration in the future. This novel database provides a unique biodiversity data repository on understorey herbs that will enable new studies on community ecology and biogeography.


INTRODUCTION
The study of herb ecology and distribution in tropical forests has advanced in recent decades (e.g., Cicuzza et al. 2013;Tuomisto et al. 2003aTuomisto et al. , 2019;;Figueiredo et al. 2022) but is still lagging behind studies of other biological components (Perea et al. 2022), such as trees (ter Steege et al. 2003(ter Steege et al. , 2006(ter Steege et al. , 2013;;Draper et al. 2021) and birds (Ribas et al. 2005).It is still an open question to what degree ecological theories and generalizations derived from trees apply to tropical forest herbs (e.g., Janzen-Connel mechanisms).Although canopy trees define forest structure and control most of the carbon and water fluxes, understory plants such as herbs and shrubs are an important part of the forest ecosystem and contain a large proportion of its taxonomic diversity (Dodson andGentry 1978, Ribeiro et al. 1999).For instance, herbs were found to contribute about 30% of the total species richness in the Rio Palenque Reserve in Ecuador (Dodson and Gentry 1978), and 23% in the Ducke Reserve in Brazil (Ribeiro et al. 1999).This taxonomic diversity promotes varied and complex interactions with animals (Royo and Carson 2006), and can also be expected with microorganisms (which are largely unknown).Herbs can also affect forest regeneration, as they compete with tree seedlings for space and resources and can thereby filter tree species composition or even prevent or delay tree establishment (George and Bazzaz 1999).However, all these relationships are little studied.
Much of the ecological knowledge using floristic inventories on tropical forest plants has been obtained from studied of trees (Hubbell 2001;ter Steege et al. 2013).Trees are laborious to sample.Instead, general floristic patterns could be studies by focusing on understorey plants.Species turnover patterns of at least ferns and lycophytes have been found to be rather similar to those of trees (Ruokolainen et al. 1997(Ruokolainen et al. , 2007;;Vormisto et al. 2000;Jones et al. 2008Jones et al. , 2013;;Higgins et al. 2011;Tuomisto et al. 2016).Knowledge of community composition, as well as of the distribution and niches of species, is fundamental for the construction of macroecological and biogeographic hypotheses, and to support conservation programs (Hortal et al. 2015).Inadequate and biased sampling has adverse effects on our understanding of biodiversity patterns (Moerman and Estabrook 2006;Grand et al. 2007;Yang et al. 2013;Oliveira et al. 2016a).Recognized shortfalls of biological knowledge include the Linnaean shortfall (many species remain taxonomically undescribed and unnamed; Whittaker et al. 2005), the Wallacean shortfall (lack of knowledge on the geographic distribution of species; Whittaker et al. 2005), and the Hutchinsonian shortfall (lack of knowledge on the ecological niches of species; Hortal et al. 2015).Field studies documenting biological diversity have been carried out in a highly biased way, such that some regions of the globe and some biological lineages have been intensively studied while little information exists for others (Whittaker et al. 2005;Diniz-Filho et al. 2010;Cornwell et al. 2019).
Inventories based on plot data can greatly reduce such shortfalls of biological knowledge, especially when implemented through standardized and integrated sampling (Magnusson 2013).Combining efforts to build large databases can boost biodiversity research, reduce the aforementioned shortfalls and enable the macroscale analyses required to understand a world under global change (Magnusson 2019).Much of the progress in the understanding of tree, palm, and liana ecology in the last two decades has been due to the creation of collaborative networks, which have either applied standardized protocols to collect biodiversity and demographic data [e.g., CTFS (http://ctfs.si.edu/);PPBio (https:// ppbio.inpa.gov.br/);RAINFOR (https://rainfor.org/)] or have compiled existing but dispersed data into accessible repositories [e.g., ATDN (https://www.atdn.myspecies.info); DRYFOR (http://www.dryflor.info);NeoTropTree (http://www.neotroptree.info])].Standardised data on ferns and lycophytes across Amazonia have been compiled by the Amazon Research team of the University of Turku (Finland) (www.utu.fi/amazon),but data on the distribution and abundance of other tropical forest herbs have not yet been organised.To fill this gap, we started the Research Network on Amazonian Understorey Herb Communities (HERBase).Its purpose is to assemble data on the abundances of herb species in sampling plots that have a known size and, if available, also associated environmental data.HERBase provides the opportunity to address broad-scale questions, fostering the understanding of herb ecology, evolution, systematics, conservation, and biogeography.Here, we describe the structure and extension of the HERBase database, compiled from projects and research teams that have been dedicated to study herb ecology in Amazonia.

Definitions
Defining an herb is not straightforward, although dictionaries define herbs as plants without a woody stem.For example, the Merriam-Webster dictionary defines herbs as seed-producing annual, biennial, or perennial plants that do not develop persistent woody tissue but die down at the end of a growing season.The term "seed-producing" excludes ferns and lycophytes.And, in practice, defining woodiness may be problematic.For example, some plants are classified as subshrubs by some researchers, but as herbs by others, and some monocotyledons (such as palms, and several Zingiberales) and ferns have fibrous layers around pseudostems or petioles that appear woody, although technically these groups do not produce true wood.
In practice, researchers have tended to consider all ferns, lycophytes, and monocotyledons as part of the tropical forest ACTA AMAZONICA herb community, but not to include non-monocotyledonous angiosperms plants that might fall in the ambiguous herb or shrub or subshrub categories (Poulsen and Balslev 1991).This issue complicates analyses based on data compiled from several sources since most often the definition of what are herbs is not explicit in each particular study.There is an obvious need to use clear definitions and standardization in future studies (see the section on recommended best practices in the Supplementary Material, Appendix S1).
Another methodological issue that may create differences among datasets is whether all possible substrates have been included in the sampling.Some herb species are obligate terrestrials and others are obligate epiphytes, but there are also several intermediate habits.For example, many species can grow both terrestrially and as epiphytes, and for species with a climbing habit, it may be difficult to determine whether they have a ground connection or not.Moreover, the substrate may change during the lifetime of an individual plant: an originally terrestrial individual may climb up a tree trunk and later lose its ground connection and become an epiphyte, or an originally epiphytic individual may create a ground connection as it grows larger.Many studies on ferns and lycophytes have taken into account all individuals, including epiphytes and climbers on the lower parts of tree trunks (Tuomisto et al. 2003a(Tuomisto et al. , 2003b(Tuomisto et al. , 2003c(Tuomisto et al. , 2016;;Higgins et al. 2011).However, other studies have only included terrestrial individuals (Poulsen and Balslev 1991) or species that are known mostly to be terrestrial (Tuomisto andPoulsen 1996, 2000;Zuquim et al. 2012Zuquim et al. , 2014;;Moulatlet et al. 2014;Tuomisto et al. 2019).Studies are not always clear on the definition of the life forms or habits and substrates included, and the simple assignment of a species to a substrate may be inaccurate.Future studies should pay special attention to clear definitions and documentation of substrates to enhance the usefulness of the data.

Data compilation
In this first compilation, we only considered herb plots located within the limits of the Amazon basin, according to the concept of Amazonia sensu latissimo proposed by Eva and Huber (2005) (Figure 1), which includes areas of the areas of savanna in the Cerrado Biome as well as montane areas that drain into the Amazon River.This delimitation of Amazonia encompasses an area of 7,595,000 km 2 and includes areas from Brazil, Bolivia, Peru, Ecuador, Colombia, Venezuela, Guyana, Suriname, and French Guiana.The studies from which data initially have been compiled into HERBase have had various objectives: complete floristic inventory (Duivenvoorden 1995), assessment of the floristic composition in specific forest types (van Andel 2003;Linares-Palomino et al. 2013), determinants of species richness and/or composition at a local scale (Tuomisto and Ruokolainen 1994;Costa et al. 2005;Drucker et al. 2008; Magalhães and Lopes 2015; da Silva et al. 2021; Rodrigues et al. 2021) or at a regional scale (Tuomisto and Poulsen, 1996;Tuomisto et al. 2003aTuomisto et al. , 2014Tuomisto et al. , 2016;;Zuquim et al. 2012Zuquim et al. , 2014;;Figueiredo et al. 2014;Moulatlet et al. 2014;Riaño and Moulatlet 2022), effects of anthropogenic disturbance on herb assemblages (Costa and Magnusson 2002;de Polari Alverga et al. 2021) and the effects of past human forest modifications on herb composition (Quintero-Vallejo et al. 2015).However, future contributions to HERBase need not be restricted to these types of studies and the database is intended as a data repository for voluntary deployment of data by accessing https://www.gov.br/inpa/pt-br/projetos/herbase.

Data curation
Taxonomic standardization is fundamental when compiling vegetation databases to address nomenclature redundancy caused by the multitude of synonyms characterizing botanical literature (Kalwij 2012), and to perform comparative analyses across data sets.In HERBase, all data compiled or received is standardized to a uniform taxonomy based on the Flora and Fungi of Brazil (http://floradobrasil.jbrj.gov.br;BFG 2021, 2018), first, with the use of the 'flora' package (https://github.com/gustavobio/flora) in the R environment (R Core Team 2022) and then by consulting specialized taxonomic works and/or specialist taxonomists.Also, members from HERBase dedicated to the taxonomy of specific plant groups (e.g., ferns ACTA AMAZONICA and lycophytes) are responsible for curating and updating data information before it enters the database.

Sampling coverage
Based on the coordinates of the plots compiled in the metadata, we calculated the spatial density of plots using the Kernel density estimation implemented in QGIS.We also addressed the geographical plot coverage using a classification of the Amazon basin as defined by Feldpausch et al. (2012) [northwestern Amazonia (NWA); southwestern Amazonia (SWA); southern Amazonia (SA); central Amazonia (CA); Guiana Shield (GS); and eastern Amazonia (EA)].

Database characteristics
As of June 2022, HERBase had 1381 plots with inventory data included, and 342 plots with only metadata included and data available upon request to the principal investigator.The inventory of herbs can be based on different metrics to provide an estimate of abundance (e.g., direct counting of individuals, estimates of cover, frequency of occupation).In HERBase, there is a predominance of density data based on the counting of individuals (89% of the plots).Cover data is available for 23% of the plots, frequently in combination with counting data (12.3% of total data).
The most common sampling unit in the herb inventories was a fixed-area plot.Sampling designs without a defined area, such as distance-sampling (Buckland et al. 2005) are rarely used for herbs and are not present in HERBase up to now.The size of the included plots varies widely (Figure 2).The most common plot areas range between 500 m 2 and 1000 m 2 (48%), followed by larger plots of ≥ 1000 m 2 (29%), and plots smaller than 500 m 2 (23%).Plot shape varies from square (8.3%) to rectangular transects (40.6%), to transects that adjust to the terrain altitudinal contour and plot width to the organism sampled (51.1%).The largest fraction of plots included in HERBase contains inventory data for all terrestrial herbs (35%), followed by inventories of only ferns and lycophytes (30%).The remaining data are quite variable, including inventories of ferns and lycophytes + Zingiberales (12%), all terrestrial herbs + epiphytes (10%), only Zingiberales (5%), ferns and lycophytes + monocotyledons (3%) and ferns and lycophytes + Araceae + Marantaceae (2%) (Figure 2).
The distribution of plots in relation to local landscape heterogeneity varied according to the aims of the original study.The two most common sampling-unit types in HERBase are: 1) 5-m wide transects (mostly ≥ 500 m long) that run across the local topographical variation (25% of included plots); and 2) 2-m wide and 250-m long plots that maintain a fixed position along elevational contour lines, belonging to PPBio infrastructure (51%).The first design aims to produce data that are representative of the landscape as a whole by maximising the hydrological variation within the transect, thus increasing the diversity of species within the transect and facilitating regional comparisons (Tuomisto et al. 2003c).The second design aims to produce data that are representative of local hydrological or soil conditions by minimizing the topographical variation within each plot, and the landscape-scale variation is captured by establishing many plots per site (Magnusson et al. 2005).

Sampling coverage
The plots are unevenly distributed across geographical regions, with 37 % located in central Amazonia, 24.7% in northwestern Amazonia, 15.1% in southwestern Amazonia, 14% in southern Amazonia, 6% in northern Amazonia and 2% in eastern Amazonia (Figure 1a).The highest density of sampling plots (Figure 1b) is located in central Brazilian Amazonia (in the surroundings of Manaus), with other sampling clusters in northwestern Amazonia (around the Peruvian city of Iquitos) and in southwestern Amazonia, close to the border between Brazil and Bolivia and along the Madre de Dios River (Peru).In general, there is a higher density of plots along the main rivers compared to inland areas.
When compared with other plot networks in Amazonia, particularly PPBio and ATDN, HERBase has a much more restricted distribution.Many of the HERBase plots belong to the PPBio plot network (white dots in Figure 1c), while PPBio plot in which herbs were not as yet sampled are represented by violet dots (Figure 1c).The ATDN network has a broad and dense coverage of tree sampling plots over the whole Amazon region, and basically all the PPBio plots included in the ATDN have also been sampled for herbs (Figure 1d).Some ACTA AMAZONICA of these plots have herb data that have not yet been included in HERBase.Most of the ATDN plots with HERBase data in Brazil coincide with PPBio plots and this spatial integration of sampling plots allows the integration of scientific data.

DISCUSSION
The HERBase initiative has managed to compile extensive plot-based information on ground herbs sampled in all Amazonian geographic regions.To the best of our knowledge, the >1,700 inventory plots included in HERBase represent a significant part of understorey herb community inventories implemented in Amazonia.

Sampling coverage
Our data compilation showed that the surroundings of the large urban centres in Amazonia have the highest plot density.Although many efforts to sample remote areas have been undertaken in the past decades, sampling is still constrained by complex logistics and high costs, thus being biased towards the areas that are most accessible from urban centers by road or river.Similar access constraints to sampling areas have affected inventories of Amazonian trees (Nelson et al. 1990;Hopkins 2007) and animals (Oliveira et al. 2016b).
In the HERBase data, the surroundings of Manaus (Brazil) stand out as especially intensively collected, followed by the surroundings of Iquitos (Peru).Both of these areas are of interest to researchers also due to their high geodiversity (Higgins et al. 2011;Figueiredo et al. 2014).Smaller concentrations of sampling are found along the river Madre de Dios in Peru, and the rivers Juruá and Tapajós in Brazil.The fact that most of our data come from central and northwestern Amazonia reflects the long-term efforts of a few research groups in local research institutions.Large sampling gaps remain in other Amazonian regions, including, for example, the region of the large urban area of Belém (Brazil), which so far has not attracted researchers to do herb inventories.The comprehensive sampling of such regions would not only improve our understanding of Amazonian herbs, but also provide environmental description of these regions, as basic environmental data are usually collected together with floristic inventories.In addition, herb inventory data can be used to infer and map environmental variables across Amazonia even when direct environmental measurements are not available (Zuquim et al. 2019;Tuomisto et al. 2019).There is great potential in the use of standardized sampling plots such as those used by the PPBio and RAINFOR projects over broad spatial scales, as it decreases the costs of infrastructure implementation and allows direct comparison of biodiversity data among sampling sites.

HERBase functioning
HERBase emerged from personal contacts among researchers interested in herbs, who agreed to share data and metadata from their plots.Taxonomic data are curated with the most updated sources and a committee of taxonomists decides on ambiguous cases.Metadata on plot location, size, shape, forest type, and minimum plant sampling size and habit is included for all datasets.HERBase is based on the principles of equality among partners, where all contributors have the same rights to propose uses for the full dataset or parts of it.A five-member HERBase Management Committee is elected among participating researchers of the network, two of which are in a coordinating role, in addition to two substitutes.Every two years, at least one new member becomes part of the committee, replacing the participating researcher with the longest time on the committee or the one(s) who wishes to resign.The Management Committee meets at the request of any of the participating members and/or by invitation of the coordinators, to discuss data use requests and to plan events, scientific dissemination, projects, and publications, among others.
The participants have priority in data use, and any of them can request data for specific uses.Most (but not all) of the data now compiled in HERBase have been made available in connection with already published articles.The advantages of participating in HERBase and requesting data internally are that a) these data have already passed through taxonomic standardization; b) the management of the database aims to minimize overlap among project proposals that would address similar research questions; and c) HERBase makes sure that the data owners are properly consulted in advance and allows them to decide whether the data will be used for the planned purpose or not, giving proper opportunities of authorship to all participants.
On the website dedicated to HERBase (www.gov.br/inpa/pt-br/projetos/herbase), the associated metadata are made openly available for anyone to explore.If interest in using the data emerges from this, requests for data use will be evaluated by the management committee and the data owners will be consulted.Detailed information on how to contribute to HERBase, as well as specific data requirements, can be found on the dedicated website.HERBase welcomes all datasets dedicated to the study of herbs from vegetation plots located within the limits of the Amazon biome as defined here.HERBase aims to contribute to the understanding of herb diversity in general with a broad biogeographic focus, so all types of data are welcome.

CONCLUSIONS
Although HERBase represents an important first step to organizing Amazonian herb inventories, we recommend that future efforts in sampling herbs consider the current biases in plot location in Amazonia.The lack of data from areas in the west and east of the region, where other plant groups have been much more intensively sampled precludes ACTA AMAZONICA a general understanding of herb species diversity and its relationship with other components of biodiversity.We also strongly recommend that new sampling efforts aim at using standardized sampling methods and following the best practices outlined in Supplementary Material, Appendix S1.HERBase is an effort to integrate and fill gaps in the knowledge about the distribution of herb species in Amazonia, and we hope it will encourage more studies of understory herbs.We encourage colleagues in posession of herb data to join the HERBase initiative.
If plots are used for permanent monitoring and individuals cannot be collected within the plot, individuals of the same morphotype should be searched in the immediate vicinity outside the plot, and photos of individuals within the plot should be used for documentation; (b) for each plot, make vouchers of at least one adult mature individual of each sampled morphotype or species recorded; (c) for small individuals of ferns and lycophytes, collect the complete plant, including the rhizome.To avoid killing large individuals, sampling can be restricted to leaves only, but then the rhizome type should be documented by photographing or describing in the specimen metadata; (d) document the variability of vegetative morphological characteristics of each species within each plot, at least with photos.Many important groups of Amazonian terrestrial herbs can be identified by experts from good photographs of the specimens.Take at least one photo of the complete individual, a branch with at least one full leaf, and any fertile parts that may be present (flowers, fruit, sori).We recommend depositing these georeferenced images in taxonomic databases; (e) record, as much as possible, the size (e.g., height) of individuals; (f ) for flowering plants, when the plant is fertile, preferably collect flowers and/or inflorescences in a wet way (preferably 70 parts of ethanol, 27 parts of water, and 3 parts of glycerin) and deposit them under the same voucher number of the dried pressed voucher in the herbarium.This wet collection preserves the three-dimensional structure of the reproductive parts, which is important for the determination of most taxonomic groups; (g) clearly record the bibliographic sources, names of specialist botanists, reference collections, and classification system used for species identification; (h) sample the entire herb community, to the greatest extent possible, rather than only taxonomic subgroups.When subgroups are sampled, clearly document what they are; (i) preserve samples in silica for genetic analysis Further information on materials to take to the field and how to collect samples on environmental variables can be found in website of the Research Program on Biodiversity

4) Sampling protocol suggestion.
Many types of sampling designs and ways of measuring the occurrence or abundance of herbs exist, and different methods have been used by different researchers to answer the same or different questions.However, this diversity can become a problem when trying to combine datasets obtained with different methods, especially in large-scale integrative analyses.Thus, as a step towards better future data integration, we suggest a sampling protocol and discuss the reasons for adopting it, as well as possible adjustments so that its application is viable in different environments.
At least one representative voucher specimen of each species or morphospecies rooted in the plot should be collected, to assure the existence of testimony material for identification confirmation and other kinds of studies.Many contemporary techniques can be applied for taxonomic identification of sterile material, such as FT-NIR spectroscopy, which works well for both angiosperm (e.g.Paiva et al. 2021) and ferns (G.Moulatlet pers.info).We obtained 16 spectral readings per individual from the adaxial and abaxial surfaces of 100 specimens belonging to 13 species.The analyses included all 1557 spectral variables.We tested different datasets (adaxial + abaxial, adaxial, and abaxial.Although herbaria generally prefer to receive fertile material, sterile specimens are usually accepted if they were collected in permanent plots and come with good enough metadata.Whenever possible, the entire plant should be photographed in the field, to record potentially important details for identification, such as habit, fertile organs, and any coloured parts.It is important to write down the camera file name in the field and relate it to the specimen registration number.The collected material must be labelled while still in the field and the registration number included in the field sheet.Collections should be kept in closed plastic bags to prevent plants from wilting before being pressed.Whenever possible, it is recommended to complement the collection made inside the plot with the collection of a whole plant of the same species outside the plot.However, in case of uncertainty in comparing morphotypes, make sure that complementary collections have a different number, to avoid mixed collections.
A complete count of terrestrial herb specimens ≥ 5 cm tall should ideally be carried out in each segment of the plot.The size criterion is a recommendation, as the inventory data compiled in HERBase include different height criteria.For ferns, the UTU protocol uses a minimum leaf length of 10 cm, which excludes fern gametophytes and small juveniles of the sporophytes, as these can be both numerous and difficult to identify.It is important to make sure to only include plants rooted within the established plot width.If in doubt whether a particular specimen is an obligatory terrestrial herb or is only temporarily in the herbaceous stratum, collect it and record the status of the individual.Later, with a reliable determination of your samples, a decision to keep or exclude the individual plant from the sample can be made.
Regarding the type of plot, we recommend that users choose one of the most common types already presented in HERBase.For complete herb inventories, including angiosperm and fern data, most of HERBase plots follow the terrain contour, while for fern data, most of HERBase plots follow the protocol by Tuomisto et al. (2003b) of installing plots along the topographic gradient.Despite their methodological differences, contour plots and plots along the topographic gradient can be combined if metadata on sub-plots is available to allow proper selection of comparable units (Moulatlet et al. 2017;Zuquim et al. 2019).

Figure 1 .
Figure 1.Location of HERBase sampling plots included by June 2022: A -within geographic regions, as defined by Feldpausch et al. (2012).White dots represent plots with data included in the database; violet dots represent plots with only metadata available in the database; B -Sampling density, according to the Kernel estimation, varies from 0 to 1, as color-coded from dark green (0) to red (1); Cin relation to sampling plots of the PPBio program (https://ppbio.inpa.gov.br/).Violet dots represent plots with complete data sets included in HERBase; white dots represent plots with only metadata available in HERBase; D -in relation to ATDN tree-plot network (https://www.atdn.myspecies.info).White dots indicate plots with complete data included in HERBase; pink dots represent the distribution and coverage of tree inventories.This figure is in color in the electronic version.

Figure 2 .
Figure 2. Frequency of herb sampling plots included in HERBase until June 2022 according to area (m 2 ) (A) and group(s) of herbs sampled (B).