DNA barcoding in Atlantic Forest plants: What is the best marker for Sapotaceae species identification?

The Atlantic Forest is a phytogeographic domain with a high rate of endemism and large species diversity. The Sapotaceae is a botanical family for which species identification in the Atlantic Forest is difficult. An approach that facilitates species identification in the Sapotaceae is urgently needed because this family includes threatened species and valuable timber species. In this context, DNA barcoding could provide an important tool for identifying species in the Atlantic Forest. In this work, we evaluated four plant barcode markers (matK, rbcL, trnH-psbA and the nuclear ribosomal internal transcribed spacer region - ITS) in 80 samples from 26 species of Sapotaceae that occur in the Atlantic Forest. ITS yielded the highest average interspecific distance (0.122), followed by trnH-psbA (0.019), matK (0.008) and rbcL (0.002). For species discrimination, ITS provided the best results, followed by matK, trnH-psbA and rbcL. Furthermore, the combined analysis of two, three or four markers did not result in higher rates of discrimination than obtained with ITS alone. These results indicate that the ITS region is the best option for molecular identification of Sapotaceae species from the Atlantic Forest.


Introduction
Tropical regions harbor a substantial portion of the worlds biodiversity and some of the most diverse and threatened biomes on the planet. The Atlantic Forest is the second largest tropical forest in South America, with an original coverage of~1.5 million km 2 , of which only 11.4-16% remains (Ribeiro et al., 2009). The Atlantic Forest is considered a hotspot of biodiversity (Myers et al., 2000) and it is comprised of highly diverse plants, with 16,146 species recorded, of which 7,524 are endemic (Forzza et al., 2010). Among the taxa that occur in the Atlantic Forest and have difficulties for species identification stands the Sapotaceae. This family consists of 53 genera and approximately 1,250 species with a pantropical distribution, most of which are found in tropical rainforests (Pen-nington, 1990). Many Sapotaceae species provide economically important products such as latex (used in the production of chewing gum), wood and fruits for human consumption (Pennington, 1990). Several species in this family also provide important resources for the animal biota, such as the golden-headed lion tamarin (Leontopithecus chrysomelas) that relies on some Sapotaceae species for food and shelter (Oliveira et al., 2010).
The phenomena of supra-annual flowering and vegetative intraspecific morphological variation mean that flower and fruit analysis is necessary for correct identification of many Sapotaceae species. However, obtaining specimens with intact floral structures is not always possible because of the ephemeral nature of flowers from some species (Terra-Araujo et al., 2012). Therefore, additional methods, e.g., molecular tools, need to be developed to assist in traditional identification. In this context, the DNA barcode, which is the use of short genomic regions that are standardized for quick, accurate species identification (Hebert et al., 2003a), has helped in molecular identification in several plant groups. This method is beneficial to ecologists and conservationists by allowing for the identification of samples when the use of traditional methods would be impossible (Hebert and Gregory, 2005).
A portion of the CO1 gene has been used successfully in the molecular identification of animal species (Hebert et al., 2003b). With regard to plant species, the rbcL and matK markers are recommended as DNA barcodes (CBOL Plant Working Group, 2009). However, these markers do not have good discriminatory power in some taxa (Du et al., 2011;Guo et al., 2011;Zhang et al., 2012); therefore, the use of additional markers, such as the nuclear ribosomal internal transcribed spacer (ITS) and trnH-psbA, is required. Li et al. (2011) proposed use of the ITS/ITS2 between regions that are formally recognized for their applicability in the molecular identification of seed plants, thereby highlighting the relevance of this marker. The ITS region is a good marker for phylogenetic studies in Sapotaceae (Bartish et al., 2005;Swenson et al., 2007Swenson et al., , 2008, and Gonzalez et al. (2009) indicated that the ITS can be helpful in the identification of species in this family. However, the efficiency of different barcode markers for the molecular identification of Sapotaceae species has not been widely tested.
In this study, we evaluated the efficiency of the plastid markers matK, rbcL and trnH-psbA and the nuclear ribosomal ITS region for the identification of Sapotaceae species from the Atlantic Forest.

Materials and Methods
Fourteen Atlantic Forest fragments were sampled in the Brazilian state of Bahia ( Figure 1). Eighty individuals representing 26 Sapotaceae species were collected (1-7 samples per species). All of the samples were identified to species level and voucher specimens were deposited in the CEPEC (Herbário do Centro de Pesquisas do Cacau) or ALCB (Herbário Alexandre Leal Costa) Herbaria (Table S1).
DNA was extracted according to the protocol established by Doyle and Doyle (1987) using approximately 50 mg of leaf tissue from each sample. Two recommended markers (matK and rbcL) and two suggested markers used as additional barcode markers for land plants (ITS and trnH-psbA) were amplified (Table 1). For PCR amplification of ITS, matK and rbcL the reaction mixture consisted of 1x buffer (GoTaq, Promega), dNTPs (0.2 mM), primers (0.5 mM each), bovine serum albumin (BSA; 0.1 mg/mL), 1 unit of Taq DNA polymerase (GoTaq, Promega), DNA (10 ng) and ultra-pure water to a final volume of 20 mL. For matK and rbcL, the following PCR program was used: 94°C for 2 min 30 s followed by 10 cycles at 94°C for 30 s, 56°C for 30 s, 72°C for 30 s and 25 cycles at 88°C for 30 s, 56°C for 30 s and 72°C for 30 s with an additional cycle at 72°C for 10 min (Elisa Suganuma pers. comm.). For the ITS region, the conditions used were: 95°C for 5 min, followed by 35 cycles at 95°C for 30 s, 50°C for 30 s and 72°C for 90 s with an additional cycle at 72°C for 8 min Vivas et al. ( Bartish et al., 2005). For trnH-psbA amplification, the PCR mix consisted of 1x buffer (GoTaq, Promega), dNTPs (0.2 mM), primers (0.5 mM each), BSA (0.375 mg/mL), 1 unit of Taq DNA polymerase (GoTaq, Promega), DNA (10 ng) and ultra-pure water to a final volume of 15 mL. The PCR program consisted of 94°C for 2.5 min followed by 35 cycles at 94°C for 30 s, 56°C for 30 s and 64°C for 1 min with an additional cycle at 64°C for 10 min. Samples that showed weak band patterns were amplified using a Top Taq Master Mix kit (Qiagen) following the manufacturer's recommendations and using the same amplification programs described above. The PCR products were purified by precipitation with polyethylene glycol (10% PEG 8000, 2.5 M NaCl) and sequenced in both directions using a Big Dye Terminator kit, version 3.1 (Applied Biosystems, Foster City, CA, USA) and an ABI 3130XL automated sequencer.
The sequences were edited using the Staden package (Staden et al., 1999) and submitted to GenBank (Table S1). The alignment was done using Muscle (Edgar, 2004) in conjunction with the Mega5 program (Tamura et al., 2011). All of the sequences were examined visually for possible errors in editing and alignment, and manual adjustments were made when necessary.
The success of the PCR and sequencing was assessed according to Li et al. (2011). Pairwise distances were calculated in Mega5 (Tamura et al., 2011) using the Kimura 2-parameter model (Kimura, 1980) to assess intra-and inter-species differences. We compared the interspecific pairwise divergences between species for single and combined analyses with different markers, using permutation procedures for comparison between means with 10,000 permutations. To evaluate species discrimination, the criteria "Best Match" and "Best Close Match" implemented in the program TaxonDNA (Meier et al., 2006) and neighbor-joining analyses (Saitou and Nei, 1987) were done using single or different combinations of regions. Combined analyses were done only for samples in which the four re-gions were successfully sequenced. Only species for which multiple specimens were sequenced were used for the analyses in TaxonDNA and the threshold for "Best Close Match" was calculated for each region (single and combined analyses) using the "Pairwise Summary" function. In neighbor-joining (NJ) analyses, the successful discrimination of species was assessed by considering the specific monophyletic groups for species for which multiple specimens were sequenced and that showed bootstrap values ³ 70%. The NJ analyses were done in Mega5 (Tamura et al., 2011) using the Kimura 2-parameter model (Kimura, 1980) and pairwise-deletion for indels. Internal support for the branches was calculated using the bootstrap method with 1000 replicates (Felsenstein, 1985).

Results
Seventy-two ITS sequences were obtained for 24 Sapotaceae species, 78 matK sequences for 26 species, 80 rbcL sequences for 26 species and 69 trnH-psbA sequences for 25 species of Sapotaceae. The primers for these markers displayed high amplification rates for Sapotaceae (Table 2). In the sequencing reactions, rbcL and matK gave the best results, followed by ITS and trnH-psbA. All of the markers produced matrices > 500 bp in size after the sequences were aligned. Indels were found for ITS, matK and trnH-psbA (Table 2). In the interspecific pairwise comparisons (single and combined analyses), the ITS region was the most divergent and the rbcL region the least divergent (p < 0.01) ( Figure S1 and Table S2). The average interspecific distance calculated based on the ITS region was 40 times greater than the intraspecific distance. The overlap between intra-and interspecific distances in plastid markers was quite pronounced, whereas in ITS these distances were not pronounced. Figure 2 shows the genetic comparisons of the intra-and interspecific divergences. For identification at the species level, the ITS performed the best among all of the markers tested (Table 3). The matK had the second best performance, followed by DNA barcoding in Sapotaceae species  (Figure 3). Only Manilkara maxima, M. multifida, Pouteria caimito and P. guianensis were not discriminated using this phylogenetic method (Figure 3). The species Pouteria cuspidata, P. egreria, P. durlandii, P. grandiflora and Micropholis venulosa showed high levels of divergence and were distinct from other Sapotaceae species for which multiple specimens were analyzed (Figure 3).

Discussion
The successful discrimination of plant species using the regions proposed for DNA barcoding by CBOL Plant Vivas et al.  Working Group (2009) may vary in plants (Hollingsworth et al., 2009;Newmaster and Ragupathy, 2009;Zhang et al., 2012). Depending on the taxon in question, the use of additional markers may be needed for discrimination (CBOL Plant Working Group, 2009). This is particularly relevant to the Sapotaceae, in which the plastid markers do not have particularly good resolution. Despite having a lower performance than matK and rbcL in sequencing reactions, the ITS showed high specific resolution. Desirable features for DNA barcoding include the universality of primers, success in sequencing, and species discrimination (Kress et al., 2005;CBOL Plant Working Group, 2009;Hollingsworth et al., 2011). In this work, all of the tested markers showed high rates of amplification. With regard to the sequences obtained, the rbcL marker was the most effective, supporting the findings of Ren et al. (2010) and Gu et al. (2011). This result was closely matched by matK, which failed in only one sample. Importantly, we observed that the lower performance of trnH-psbA compared to the other markers resulted from the difficulty in sequencing this marker in Sapotaceae, probably because of the presence of mononucleotide repeats (> 10 bp) that undermined the sequencing reactions. Devey et al. (2009) reported the occurrence of these repeats in many species and demonstrated how these microsatellites interfere in obtaining high quality sequences, exactly as observed here. For ITS, the success rate for sequencing was reasonable but lower than for the matK and rbcL markers. However, the ITS was highly discriminatory and useful for the molecular identification of Sapotaceae species.
The most desirable characteristic of DNA barcoding is successful species discrimination. Based on this criterion, the ITS was useful in the Sapotaceae because of its high interspecific distances and low values in intraspecific comparisons. In addition, the ITS region showed little overlap between the intra-and interspecific Kimura 2-parameter distances, culminating in high specific resolution. NJ analyses showed that only four species (M. maxima, M. multifida, P. caimito and P. guianensis) were not identified using ITS-derived data. This result suggests recent divergence beyond retaining ancestral polymorphisms for the ITS in original populations and may limit its usefulness for species identification in these cases. In addition, low rates of divergence may be observed in some groups of tree species because of the long generation time, resulting in lower rates of mutation (Kay et al., 2006).
Manilkara salzmannii showed great phenotypic plasticity in vegetative characters despite high values of intraspecific divergence. The high values of intraspecific divergence observed in P. reticulata, coupled with the large phenotypic plasticity of its vegetative characters, suggests that this group may represent a complex of species, but this hypothesis requires further studies. The species Chromolucuma apiculata and Pouteria gardneri (both sustained based on morphological characters) showed very low divergence (0.5%), indicating that they may belong to the same genus; this could reflect homoplasy in the morphological characters used to delimit the genus Chromolucuma.
In a preliminary analysis of a portion of the ITS region, Yoccoz et al. (2012) reported that this region was more efficient in discriminating Sapotaceae species than plastid markers. Furthermore, Gonzalez et al. (2009) indicated the potential of ITS for molecular identification of Sapotaceae species in the Amazon region. Our results corroborate those of Ren et al. (2010) for Alnus spp., Yan et al. (2011) for Primula spp., Guo et al. (2011) for Hedyotis spp., and Du et al. (2011) for Potamogetonaceae. In these studies, the ITS region showed good discrimination of species. For example, Singh et al. (2012) reported a specific resolution of 100% using samples of the genus Dendro-Vivas et al. bium, indicating that in some cases this region alone is sufficient for the molecular identification of plant species.
The plastid markers trnH-psbA, matK and rbcL had a weaker performance compared with ITS alone, with low interspecific distances, and overlaps with intraspecific distances (Figure 2). For example, in Manilkara, no species were identified with these markers. The low success in identifying species using plastid markers limits their usefulness for molecular identification in Sapotaceae. This result can be explained by the low mutation rate observed for this genome compared with the nuclear genome (Wolfe et al., 1987). In the combined analyses, the combination proposed by CBOL (matK+rbcL) performed poorly as a plant barcode, as did other combinations that did not include the ITS. Combined analyses using ITS worked successfully but were never superior to the individual ITS analyses. This finding further strengthens the potential usefulness of ITS by itself as a plant barcode for future work with Sapotaceae.
Taxonomic status is an essential consideration in adopting the appropriate conservation strategies and management plan for a given species. The use of the ITS by itself for the molecular identification of Sapotaceae species provides new opportunities for studies involving species of this family, with the possibility of easier and faster identification from sterile material. In view of estimates that > 50% of the species in this family are not yet known to science (Joppa et al., 2010), this technique may help troubleshoot specific taxonomic problems and be useful in the initial screening of potential new species for further taxonomic characterization. Based on the results of this study, we suggest the ITS region as the best option for the molecular identification of Sapotaceae species in the Atlantic Forest, and highlight the potential of this marker for the identification of other species of this family. The use of an integrated taxonomic approach for studying the Sapotaceae should help uncover the hidden diversity in this family.

Supplementary Material
The following online material is available for this article: Figure S1 -Boxplot of K2P distances between the Sapotaceae species considered in this study using ITS, matK, rbcL and trnH-psbA markers.

Associate Editor: Fabrício Rodrigues dos Santos
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

DNA barcoding in Sapotaceae species
Figure S1 -Boxplot of K2P distances between the Sapotaceae species considered in this study using ITS, matK, rbcL and trnH-psbA markers.