Molecular insight for baru Dipteryx alata (Fabaceae) populations based on novel SSRs

ABSTRACT Baru tree (Dipteryx alata) is an arboreal, fruitful plant native to the Cerrado biome with an important socioeconomic impact. Populations of this species are a good model to study anthropogenic disturbances on the biome through the genetic information. In this study, we developed seven new polymorphic microsatellite markers for D. alata, using an enriched genomic library. We performed loci characterization in three populations, obtaining a total of 49 alleles, with an average of 5 to 5.57 alleles per locus. A significant content of polymorphic information was obtained, as indicated by the average expected heterozygosity (uHE), with a total average of 0.58 to 0.65 per locus. The average value of the observed heterozygosity (Ho) was also high, with a total average of 0.73 to 0.85 per locus. Some of the loci are in linkage disequilibrium, such as DalatG6 with DalatB3, DalatH3 and DalatB4. The estimate of the combined loci for the probability of paternity exclusion obtained an average value of 1.00 for all loci, and the average combined probability of identity, the values were (1.210^-5) to (4.410^-6). All markers are informative and suitable for studies on genetic diversity and population structure, aiming at the conservation and management of the species.

Understanding patterns in which genetic variability is organized in populations is of fundamental importance for the development of conservation and sustainable use strategies for species and their habitats (Hoban et al. 2022).Dipteryx alata Vog.(Fabaceae), popularly known as baru tree, is a native tree species that has multiple uses ranging from human and animal food, to its pharmaceutical properties (Soares et al. 2008).It is an endemic species to the Cerrado biome, and due to its wide distribution throughout the area and abundance in several of its habitats (Canuto et al. 2008), it has potential as a model to check the impact of human intervention on the environment, through the characterization of its diversity and genetic structure and the effective size of populations (Collevatti et al. 2013).For this, the microsatellite markers (SSR -Single Sequence Repeats) are considered great tools to access the genetic variability of populations due to their codominant inheritance, high degree of polymorphism, multi-allelic and high reproducibility properties (Cosson et al. 2014).Thus, this marker provides information on the long-term evolutionary history of species, mutation, isolation of the population, as well as the mechanisms of genetic drift, gene flow and selection (Allendorf 2017).
Despite the importance of D. alata for the Cerrado, little is known about the genetic variability in populations (Soares et al. 2008;Soares et al. 2012;Guimarães et al. 2017).Although specific SSR markers are already available (Soares et al. 2012;Guimarães et al. 2017), these studies show a low level of polymorphism in loci, limiting the type of studies that can be carried out.Thus, in order to increase the coverage of loci representative of the genome, we developed and characterized a set of new microsatellite loci for D. alata.These markers have the potential to be used in population studies at various scales, in order to provide subsidies for the conservation and management of the species and its area of occupation.
For that purpose, we construct a microsatellite-enriched genomic library according to the protocol adapted from Billotte et al. (1999).An individual of D. alata was chosen randomly at the Mário Viana Municipal Park, a Conservation Unit, located in the municipality of Nova Xavantina, state of Mato Grosso, Brazil, and total genomic DNA was extracted from young leaves using the commercial kit Plant DNeasy ® (Qiagen).A sample of 5 µg DNA was digested with the restriction enzyme AfaI (Invitrogen).The digested fragments were ligated to specific adapters Rsa21 and Rsa25 using T4 DNA ligase and an enriched library was obtained by hybridization of probes with Biotin-IIIIII (CT)8 and Biotin-IIIIII (GT)8 and magnetic beads coated with streptavidin (MagneSphere Magnetic Separation Products; Promega Corporation, Madison, Wisconsin, EUA).The captured fragments were amplified by PCR, with the primer adapter Rsa21 (10µM), cloned into a pGEM-T Easy Vector (Promega, Madison, WI, USA), and then inserted into competent Escherichia coli cells 10H10b by electroporation.We selected positive clones using white/blue screening, and then we checked the presence and size of cloned inserts through PCR and a 2% agarose gel, then the best 48 clones were sequenced on an ABI 3500xL Genetic Analyzer (Applied Biosystems, Foster City, California, USA) using primers T7 and SP6, and BigDye Terminator version 3.1 Cycle Sequencing Kit (Perkin Elmer -Applied Biosystems).
Microsatellite regions were identified using SSRIT (The Simple Sequence Repeat Identification Tool) (Temnykh et al. 2001).The primer pairs were designed using the software primer3Plus (Untergasser et al. 2007), with the following parameters: maximum primer size of 25 bp, annealing temperature of the primer (Tm) varying from 52 °C to 65 °C, minimum and maximum GC percentage of 40% and 60%, respectively, and amplification product range between 100 and 700 bp.
For genotyping the loci, alleles were scored against the GeneScan-600 (LIZ) internal Size Standard Kit (Applied Biosystems, Foster City, CA, USA) using the Geneious 8.1.6software (Kearse et al. 2012).Descriptive statistics were run in the GenAlex 6.5 software (Peakall and Smouse 2012), in which the estimates of number of alleles per locus (N A ), observed heterozygosity (Ho), expected heterozygosity (uH E ) and fixation index (f) and the probability of identity (PI) and probability of paternity exclusion (PE) were evaluated.The FSTAT 2.9.3.2 software (Goudet 2001) was used to analyze the linkage disequilibrium between the loci.
Out of the 48 clones sequenced, 29% sequences contained microsatellite regions.The microsatellites found in these sequences were classified as perfect microsatellites, with dinucleotide-type repeat motifs, this motif is present in 100% microsatellites found in the library.
From the primer pairs designed, a set of 10 loci were selected for synthesis and validation (Tab.1).Of these, seven loci were polymorphic and three were monomorphic in the evaluated populations.Similar parameters in the percentage of sequences obtained with genomic library construction was also observed in the protocol of Billotte et al. (1999), which varied from 20 to 90% for tropical plant species.However, based on genetic diversity studies for D. alata (Soares et al. 2008 et al. 2017), this is the first to develop SSR markers using the microsatellite-enriched genomic library technique.A total of 49 alleles were identified at the seven polymorphic loci, with the number of alleles per locus ranging from two at the locus (DalatB3) to twelve at the locus (DalatB4) (Tab.2).The total average of alleles in the three populations ranged from 5 to 5.57 alleles per locus (Tab.2).The observed heterozygosity ranged from 0.00 to 1.0 per loci depending on the sampled population, and the expected heterozygosity ranged from 0.34 to 0.74 per loci, with total averages ranging from 0.73 to 0.85 for H O and from 0.58 to 0.65 for uH E (Tab.2).The observed heterozygosity (H O ) was greater than the expected heterozygosity (uH E ) for most loci, except for loci (DalatB3 and DalatB4).As a consequence, the fixation index (f) was negative for most loci, except for loci (DalatB3 and DalatB4), with total averages ranging from -0.15 to -0.28 (Tab.2).
The variation in the number of alleles per locus is common in microsatellite regions due to the high levels of polymorphism found in these markers (Ellegren 2004), which is directly related to the high mutation rates occurring in these regions (Gao et al. 2013).The H O > uH E relationship for most loci suggests an excess of heterozygotes in relation to that expected by the Hardy-Weinberg equilibrium.This excess of heterozygotes may be occurring due to the reproductive characteristics of the species or even due to a selection in favor of heterozygotes for D. alata (Gomes and Moura 2010), causing a lack of inbreeding for most loci (f values close to or less than zero).This result reveals the mixed mating system with a predominance of outcrossing found for D. alata (Tambarussi et al. 2017).
The probability of identity (PI) of each locus was estimated, which ranged from 0.12 (DalatB4, DalatC3 and DalatF6) to 0.75 (DalatB3), with average estimates of the combined analysis ranging from (1.2 10^-5 ) to (4.4 10^-6 ).The probability of paternity exclusion (PE), based on the seven pairs of microsatellite loci, ranged from 0.12 for locus (DalatB3) to 0.70 for locus (DalatB4), in the combined analysis, where no parent is known, the average value was 1.00 for all loci (Tab.2).The set of microsatellite markers showed a low combined PI, indicating the probability of finding, by chance, two individuals from a sample with the same genotype in each set of markers is minimal (Val et al. 2020).The PE through the combined analysis for the developed SSR loci was high, providing high reliability to correctly exclude an individual from paternity (Rocha et al. 2018).The seven SSR loci were sufficient to distinguish between ninety individuals, indicating that the set of microsatellite markers is efficient in discriminating individuals in populations (Val et al. 2020).
According to the linkage disequilibrium (LD) test, most loci segregate independently, with no significant association between them, except only for pairs of loci (DalatG6 X DalatB3, DalatG6 X DalatH3, DalatG6 X DalatB4 and DalatB4 X DalatB5), for which the LD was significant even after Bonferroni correction (α = 0.0023809524) (Tab.S1).This indicates that some of them were more correlated in the total sample population than would be expected with random crossover.Different biological factors can increase the LD, for example, small population size, low recombination rate, natural or artificial selection, population mixture, among others (Flint-Garcia et al. 2003).In our study, these observed LDs suggest they undergo selection, corroborating the revealed excess of heterozygotes (f < 0).However, further studies are required to consider such markers in LD as "blocked".
The developed microsatellite markers have an expressive power of discrimination and gene diversity.Therefore, these new loci added to the number of SSR markers available for D. alata will contribute to a greater random sampling of the genome and, consequently, estimates of genetic variability with greater precision, more effectively contributing to the conservation of this species.

Table 1 .
Characteristics of the ten microsatellite loci developed for D. alata.

Table 2 .
Genetic parameters of microsatellite loci determined in three populations of D. alata.