Are we close to knowing the plant diversity of the Amazon?

3) Abstract: Amazonia is often cited as having the most diverse flora on the planet. However, the total number of species of higher plants in the region has been largely a matter of guesswork. Some recent publications have estimated the total number of species present, which indicate a lower overall diversity than was estimated in the past. However, analysis of the sampling density across the region, and data from various sources suggest that there may be reason why the recent figures may be considerable underestimates. I believe that much more investment in extensive collecting of quality plant specimens is needed to encounter the very large number of rare and local species that might never have been collected. Unfortunately the tendencies of investment in botany, in terms of geography and types of project, suggest that we will probably not be able to accurately assess the real diversity of the region.


INTRODUCTION
In 1992 a project was funded to provide a list of species and guide to the plants of the Adolfo Ducke Forest Reserve on the outskirts of Manaus. The reserve was selected because it was known as the botanically best known area in Amazonia with over 7000 registered plant collections, and a little over 1000 species recorded over 40 years (Ribeiro et al. 1994). However, after 5 years of taxonomically directed collecting, the number of species known for the reserve approximately doubled (Hopkins 2005), and at least 50 species new to science were found. This raised the question of how many new species would be found if projects similar to the Ducke Flora were carried out in poorly known areas in Amazonia?
The low level of collection density across Amazonia is well documented. estimates of average collection density across the region are between 0.1 and 0.2 collection per km 2 . Furthermore, there is a very strong tendency for collection density to be high in very few localities, such as close to larger cities (Nelson et al. 1990, Schulman et al. 2007, and consequently far lower in more distant and more rural areas. If the collection density is low, the chances are that many species will not be represented in species lists, and that species with limited distributions in areas not visited by botanists will not have been collected. Recently, there have been more systematic attempts to list the total number of species present by assembling the taxonomic data based on herbarium collections (Flora do Brasil 2018, ter Steege et al. 2016) and by using modeling to estimate total species richness from the data obtained from forest inventories (ter Steege et al. 2013). By extrapolating the rank abundance curve of almost 5000 species of trees in forest plots, a total of approximately 16,000 species of trees was estimated for the Amazon Basin, of which approximately 6,000 of these would have populations of less than 1000 individuals.
The continuing Brazilian Flora project (Flora do Brasil 2018) is listing the flora of Brazil based on documentation in the literature and/or specimens preserved in collections. These estimates tend to be lower. The total number of Angiosperms (not only trees) listed for Amazonia (only Brazil) was 12,217 in 201512,217 in , 12,414 in 201512,217 in and 12,848 in 201812,217 in (Forzza et al. 2010Brazil Flora group;Flora do Brasil 2018). Notably, the total flora of states in the southern third of Brazil often had longer species lists than much larger states on Amazonian Brazil. Ter Steege et al. (2016) also published a list of 11,676 tree species in Amazonia based on data from herbaria, a number which was heavily criticized by a group of botanists (Cardoso et al. 2017) who reevaluated the list based on taxonomically verified data to only 6,727 tree species (and a total of 14,003 angiosperm species for all of the Amazon Basin).
The question I address in outline here is: are the estimates being published reasonable minimum (or maximum) estimates of Amazonian plant diversity, or are their reasons to believe that the tendencies in the history of collecting activity in the region might cause significant underestimation of the total diversity? These comments are in line with previous publications (Hopkins 2007, Milliken et al. 2011) and presage my on-going research and of my students.

WHAT SORTS OF INFORMATION SUGGEST THAT MANY MORE SPECIES MIGHT BE AS YET UNDESCRIBED?
The examples given here are largely based on data from monographs used in Hopkins (2007) and data from a large personal data set of Amazonia collections assembled and continuously updated as part of my research. Note that in the case of plant specimens the concept of the duplicate strongly affects the calculations. Most botanists made several duplicates of their collections, which are distributed to different herbaria, where they may follow different paths in terms of their databasing and identification. This dataset reassembles the duplicates to collection level by standardizing the collector name and number and standardizing species names, correcting for synonymy. Nevertheless, it is a work in progress with continuous cleaning activity.

1) Data on collection frequency in herbaria.
While some species have been collected many times, probably because they are relatively conspicuous because they are widespread, locally common, flower regularly etc., others have been rarely collected. The tallest column in figure 1 (in this case for Sapotaceae, but the same is seen in most species diverse families) is for species collected on only a single occasion. With further collecting activity, especially of the type employed in the Ducke project, that is to say directed towards collecting the rarer species, we would expect the curve to move to the right, and new species, previously uncollected, would appear on the left. This is an example of a veil line. In this case, the shape of the curve suggests unknown diversity hidden beyond the left axis of the graph.
2) Sizes of species distributions.
Some species have wide geographic distributions, while others are very restricted in where they occur. The size of a species' range may be the result of a number of ecological factors, such as niche requirements limiting their distribution by type of soil, vegetation type, altitude or hydraulic regime. geographic or other factors might limit their current distribution, as might other biotic factors such as their pollinators, herbivores, or competitive species.
However, our knowledge of their distributions is also limited by collection intensity, adequacy of taxonomic study and identification. A species recorded as widespread might actually be several closely related, difficult to distinguish species, a species recorded rarely might be difficult to identify, flower rarely or occur in areas historically unvisited by collecting botanists. Herbarium data, and data in large on-line datasets, are of limited use for assessing plant distributions in Amazonia. Only a small proportion of records are georeferenced (typically 15-30% in most sources), and many errors in these occur. Auto georeferencing in Amazonia is difficult as the location data is often vague, incorrectly typed, or referenced to places which do not appear on maps. general estimates based for example on centroids of municipalities are dangerous to use as many municipalities in Amazonia are enormous. And also many collectors do not record this level when collecting, or are frequently incorrect. Furthermore, identification errors are very common in online and herbarium databases.
The best source of available geographical information on species' ranges is found in botanical taxonomic monographs in the style of Flora Neotropica. In these the author can be relied upon to have correctly identified the material examined, and have made studious attempts to manually estimate the collection localities. Using this data (Figure 2), it can be seen that relatively few species have widespread distributions (as measured by the number of 1 by 1 latitude/longitude degree squares they have been recorded from). In this case, the most frequent case is to be recorded from a single degree square. This indicates that most species of plants in Amazonia are not widely distributed, but occur only very locally. given that much of Amazonia has not been botanically investigated, this again suggests that there is another veil line here where many species that happen to occur in areas unvisited by botanists have yet to be collected.

3) Frequency in study plots.
Data from study plots where all plants above a certain size are cataloged and identified should indicate the degree of rarity or commonness locally. There are a number of practical problems in using this data, mostly associated with identification. even in herbaria, identifiers of collections with flowers and/or fruits often make mistakes, and experience in many herbaria shows that we can expect 20-45% of specimens to bear an incorrect identification. Field identifications are even more difficult because the plants often lack flowers or fruits on which taxonomic botanists principally base their identification clues and the identifications are generally made by people without detailed experience of the groups being identified. Identification guides are generally not available and using existing ones (such as Ribeiro et al. 1999) is likely to cause identification errors in areas distant from where it was researched. We can therefore expect that rarer, and/or taxonomically little-known species will be harder to identify, and similar species (such as morphologically similar congeneric species in hyper diverse genera) will tend to be grouped. In the Ducke Reserve, we can have more confidence in identifications and the unpublished data from a forest inventory there (Fig. 3 in Milliken et al. 2011) shows that the most common pattern in 56 ha is to be represented by a single individual. Again there is a veil line on the left axis, with many species which were not found in the inventory plots not appearing in this graph.

4) Taxonomic discovery curves.
With more knowledge of a regional flora, especially with more collecting events, we will gradually get closer to knowing the total number of species that occur there. A curve of the number of species known over time should be asymptotic, gradually approaching the total number of species. However, such curves in Amazonia do not fit well to an asymptotic curve. For example, the curve for species discovery curve for Sapotaceae in Amazonia ( Figure 3) shows a more logarithmic curve, much influenced by taxonomic treatments such as those by Pennington 1990Pennington , 2006 where he described many species, mostly based on recent collections. In this case the veil line is to the right, with potentially more species to be found in the future, but this obviously depends on more collections being made. each of these analyses indicates that there are certainly more species to be found in Amazonia. But how many? Only a few or a very large number? If we combine these analyses, I believe it is clear that if we were able to make many more collections in areas unvisited or only superficially visited by botanists, many species with limited distributions would be found. If we were able to make collections over longer periods of time, the locally rarer species would be more likely to be collected in flower and fruit, and thus would be described. given that the data suggests that most species are locally distributed, locally rare, the combination of the diversity behind the veil lines suggests that there is an enormous number of rare species to be found, many more than predicted in recent publications.
An interesting question is whether Amazonia is intrinsically different from other areas in terms of its record in taxonomic discovery. Using data from all Brazilian species, and charting the rate of accumulation of botanical knowledge by region (percent of species known over time, based on the date of publication of the earliest synonym) shows a difference in form between Amazonian Brazil and the four other regions. It is difficult to compare the five regions as the point at which a region is close to a 100% catalog of its species is difficult to estimate. But a possible proxy for comparing the taxonomic situation is to measure the difference (in years) between the dates at which each region achieved 50% of the taxonomic knowledge that we have today. Doing this (Figure 4) indicates that Amazonia is, by this conservative measure, 65 years behind all the other regions of Brazil. Repeating this analysis at state level ( Figure 5) shows a strong negative tendency from the south east to the north west, with all the Amazonian states far behind the southern and northeastern states in terms of discovery of their floras.

WILL WE DISCOVER THESE UNKNOWN SPECIES?
The only possible means to discover the missing biodiversity is through intensive collections, especially in areas distant from cities. given the size of Amazonia, the financial and administrative costs of undertaking long-term research is very high. Furthermore, basic, explorative research and the   investment in collections and taxonomic research is no longer prioritized. Collecting expeditions are considered "old-fashioned" if not linked with innovation, development, or ecological modeling on a global scale. Unfortunately the consequences for conservation, modeling and developmentbased on inadequate taxonomy and consequently erroneous identifications -are an "inconvenient truth" for funding decision makers.

CAN AMAZONIA "CATCH UP"?
Another aspect to the botanical problems in Amazonia is the unequal geographical distribution of resources, both human and financial, in Brazil. Although Amazonia accounts for more than half of the territory of Brazil, recent specifically botanical programs have allocated only between 5 and 10% of the resources to Amazonia, with the vast majority being allocated to three states in the south east, São Paulo, Rio de Janeiro and Minas gerais. Human resources are also greatly skewed in the same way.
If it is thought to be important to have access to the genomes of Amazonia plants, or to know the correct identity of plants being exploited, or to know the real numbers of species present in any ecosystem, I think it is clear that there needs to be a massive reorganizing of resources within Brazil, with a program of "biodiversity prospection" on a continental scale. Studying only the currently known species will result in poor planning, poor conservation and missed opportunities.