High levels of genetic differentiation and selfing in the Brazilian cerrado fruit tree Dipteryx alata Vog. (Fabaceae)

Dipteryx alata is a native fruit tree species of the cerrado (Brazilian savanna) that has great economic potential because of its multiple uses. Knowledge of how the genetic variability of this species is organized within and among populations would be useful for genetic conservation and breeding programs. We used nine simple sequence repeat (SSR) primers developed for Dipteryx odorata to evaluate the genetic structure of three populations of D. alata located in central Brazil based on a leaf sample analysis from 101 adults. The outcrossing rate was evaluated using 300 open-pollinated offspring from 25 seed-trees. Pollen dispersal was measured by parentage analysis. We used spatial genetic structure (SGS) to test the minimal distance for harvesting seeds in conservation and breeding programs. Our data indicate that the populations studied had a high degree of genetic diversity and population structure, as suggested by the high level of divergence among populations . The estimated outcrossing rate suggested a mixed mating system, and the intrapopulation fixation index was influenced by SGS. We conclude that seed harvesting for genetic conservation and breeding programs requires a minimum distance between trees of 196 m to avoid collecting seeds from related seed-trees.


Introduction
The cerrado (Brazilian savanna) is the second largest Brazilian biome and a hotspot of global biodiversity (Myers et al., 2000). This hotspot designation was based on the biological diversity of the region, the number of endemic plant species and the urgent need to protect this region from major human impacts (Myers et al., 2000). The intense and disorderly exploitation by agricultural expansion and livestock production has modified more than 50% of the cerrado (Klink and Machado, 2005). Large populations of notable native food resources such as Dipteryx alata and Caryocar brasiliense have been destroyed and continue to be neglected (Pott and Pott, 2003). Currently, with the need for conservation, forest restoration and food security there is a great demand for genetic information on native plants.
Among cerrado plants with a potential for exploitation, D. alata, locally known as "baru", has a wide range of applications, e.g., as human and animal food, a substrate for the pharmaceutical industry and construction wood (Siqueira et al., 1993). This species is a native fruit tree of the family Fabaceae, widely distributed in the cerrado (Ribeiro and Walter, 2000). The hermaphroditic flowers of D. alata are pollinated by diverse insects and its seeds are gravityand animal-dispersed (Ribeiro and Walter, 2000). The genetic variability among populations of D. alata was described by Soares et al. (2007) using random amplified polymorphic DNA (RAPD). Because of the dominant nature of RAPD markers, the levels of within-population inbreeding were estimated using restricted models and the mating system was not determined, thus leaving an information gap that needs to be filled.
Microsatellite markers, or simple sequence repeats (SSR), have been used in natural population studies as they are codominant and highly polymorphic when compared with other classes of markers. SSR markers have been widely used to address several questions in conservation and population genetics, such as population structure, intra-population spatial genetic structure, contemporary pollen and seed dispersal and mating systems (Bittencourt and Sebbenn, 2007;Dick et al., 2008;Hanson et al., 2008).
The main objective of this study was to obtain genetic information that could be useful for the domestication and breeding of D. alata, and its conservation. The genetic diversity, population structure, gene flow and mating system of this species were studied using nuclear SSR primer pairs originally developed for Dipteryx odorata.

Genetic material
Leaf samples from 101 adult trees of D. alata were collected from three populations ( Figure 1) at least 219 km apart in the State of Minas Gerais (MG), municipality of Campina Verde (n = 30; 19°31'35" S, 50°01'23" W), in the State of Goias (GO), municipality of Itarumã (n = 30; 18°44'36" S, 51°14'58" W), and in the State of Mato Grosso do Sul (MS), municipality of Brasilândia (n = 41; 21°14'24" S, 52°01'12" W). All trees in the populations were located in pastures at a low population density, where they had been left after land clearing for agricultural expansion with the sole purpose of offering shade for livestock.
The geographic location of the adults in all of the areas studied was mapped with a global positioning system (GPS) and leaf samples for genetic analysis within a population were collected from trees separated by an average distance of 40 m. The mating system was studied using open-pollinated seeds collected from 25 seed-trees in the MS population. Twelve seedlings were randomly sampled from each seed-tree, i.e., a total of 300 offspring.

Microsatellite markers and DNA isolation
DNA was extracted from the leaves of adult trees and offspring by using the CTAB method described by Doyle and Doyle (1987). DNA was quantified by agarose gel electrophoresis and diluted to a concentration of 2.5 ng/ mL. Each amplification reaction (12.74 mL) contained: 3 mL of genomic DNA, 1.3 mL of 10X PCR buffer (10 mM of Tris-HCl, 50 mM of KCl, 1,5 mM of MgCl 2 , pH 8.3), 4.30 mL of forward and reverse primers (1 mM), 0.2 mL of Taq DNA polymerase (5 U/ mL, Phoneutria); 1.3 mL of BSA (2.5 mg/ mL) and 1.34 mL of ultrapure water. Amplifications were done using an MJ Research PT-100 thermal cycler adjusted to the following conditions: 5 min at 94°C, 30 cycles of 1 min at 95°C, 1 min at the specific temperature for each primer (Table 1), 1 min at 72°C, and a final elongation step of 7 min at 72°C. The amplification products were separated on 5% (w/ v) polyacrylamide gels by electrophoresis at 55 W for 1.5 h. A 10 bp ladder (Invi-trogen®) was used as a size marker. The gels were stained with silver nitrate (Creste et al., 2001). Since microsatellite primers have not yet been developed for D. alata, we used 42 primer pairs previously developed by one of us for D. odorata (Vinson CC, MSc dissertation, Universidade Federal do Pará, Belém, 2004), obtained from CENARGEN-Embrapa (National Research Center of Genetic Resources and Biotechnology -Brazilian Agricultural Enterprise).

Genetic diversity analysis
The mean number of alleles per locus (A), the mean observed (H o ) and expected (H e ) heterozygosities, and the fixation indices for each population (f i ) were estimated by using the program FSTAT 2.9.3 (Goudet, 2002). The significance of the f i values was tested using a Monte-Carlo permutation approach implemented in FSTAT 2.9.3. This program was also used to test for deviation from Hardy-Weinberg equilibrium and linkage disequilibrium, with 1000 allelic permutations among individuals and a P-value corrected for a significance level of a = 0.05. Multiple tests were done with the Bonferroni correction.
Genetic structure was quantified using the Weir and Cockerham (1984) parameters (f = F IS , F = F IT and q p = F ST ). Since most microsatellite mutations involve the addition or subtraction of a small number of repeat units according to a stepwise mutation model, population differentiation was also assessed by Slatkin's corrected R ST Tarazi et al. 79 coefficient (Goodman, 1997). To improve the information about interpopulation genetic differentiation, we also used the correction proposed by Hedrick (2005) for the G ST measure (Nei, 1973), namely $ $ ( $ ) , where H s is the intrapopulation genetic diversity (Nei, 1973). These statistics were computed with a significance test using FSTAT. The historic gene flow (Nm) among populations was estimated indirectly, using the island model proposed by Crow and Aoki (1984), which corrects the analyses for finite number of populations: . In this expression, the genetic divergence between populations (F ST ) was substituted by q p , R ST and $ ¢ G ST , and the correction for a finite number of populations (n) was calculated as

Mating system analysis
The program MLTR 3.2 (Ritland, 2002), which is based on a mixed outcrossing and correlated outcrossing model, was used to estimate the single and multilocus outcrossing rates for the MS population. We estimated the following population and family parameters: 1) multilocus outcrossing rate (t m ) by the maximum likelihood method, 2) single outcrossing rate (t s ), 3) the outcrossing rate between related individuals ( $ $ $ ) t t t P m s = -, 4) the paternity correlation (r p ) and 5) the selfing correlation (r s ) for the popula-tion. The standard error of the estimates was obtained through 10,000 bootstraps in which the resampling units were the families.
Null alleles were detected by comparing seed-tree genotypes with their offspring in the MS population using the program MLTR. Null alleles, rather than mistyping, were considered to be present when a homozygous seed-tree mismatched its offspring genotype. In the presence of null alleles the homozygous seed-tree genotype was corrected to a heterozygous form (observed allele/null allele) and the offspring genotypes were corrected through an expected Mendelian segregation ratio of 1:1, as described by Liewlaksaneeyanawin et al. (2002).

Minimal number of seed-trees
The minimal number of seed-trees = needed for harvesting seeds in conservation, management and breeding programs was calculated as described by Sebbenn (2002). In this method, 1) N e(reference) is the minimum effective sample size of seeds to be collected (50, 500 and 1000) and 2) N e(v) is the variance effective size for a single seed-tree progeny of size n (n = 100), estimated by Genetic structure of Dipteryx alata where F s is the fixation index in offspring and Q xy is the average coancestry coefficient within families, estimated as the half of the relatedness coefficient, where F a is the fixation index in the adult population and s is the selfing rate, $ ( $ ) s t m = -1 .

Paternity analysis
Paternity analysis of each seed was done by maximum-likelihood paternity assignment using the program CERVUS 3.0 (Kalinowski et al., 2007). All 41 of the trees sampled in the MS population were used to determine the putative pollen donor of the seeds. Paternity was based on the D statistic (Marshall et al., 1998). The critical value of D for each confidence level of paternity analysis was determined by running simulations with CERVUS 3.0 based on 50,000 replications with a 95% confidence level and considering a 0.01 proportion of mistypes. The possibility of selfing was also considered. The total paternity probability of the parent pair was also estimated for all sampled individuals in the population.

Spatial genetic structure
Spatial genetic structure (SGS) within the sampled populations was studied using the estimate of the average coancestry coefficient (q xy ) between all pairs of adult trees based on Loiselle et al. (1995). Five consecutive distance classes from 0-196 m (class 1) to 784-980 m (class 5) with 196 m distance intervals were used to plot the SGS correlograms. A 95% confidence interval was calculated for each observed value and each distance class generated from 10,000 permutations of individuals within populations. These analyses were done using the program SPAGEDI version 1.2 (Hardy and Vekemans, 2002).
In the presence of SGS, the fixation index values within populations are over-estimated due to the Wahlund effect (Bittencourt and Sebbenn, 2007). This intrapopulation fixation index (f i ) can be corrected by eliminating the Wahlund effect through the application of a relationship for F-statistics described in Bittencourt and Sebbenn (2007) and based on the formula ( $ ) ( $ )( $ ) (Wright, 1965). For this correction, Bittencourt and Seb-benn (2007) proposed replacing F IT and F ST by f i and q xyi , respectively, such that the corrected intrapopulation fixa-

Genetic diversity
Of the 42 primer pairs tested, nine (21.4%) amplified visible products and were polymorphic in D. alata (Table 1). Exclusive and rare alleles were found in all of the populations. Among the nine SSR loci studied, only the MS population showed fixed alleles at locus DO31. The mean loci polymorphism for all populations was 96.3% and the mean number of alleles per population was 3.1 with no significant difference among populations ( Table 2). All loci in the three populations where considered independent by linkage disequilibrium analysis (p-value > 0.0013) The mean observed ( $ H o ) and expected ( $ H e ) heterozygosities for all loci were 0.342 and 0.619, respectively, and the mean fixation index ( $ f i ) was 0.122 (Table 2). All populations deviated from Hardy-Weinberg equilibrium with significant inbreeding (p-value < 0.0056) ( Table 2). A significant total fixation value ( $ . F = 0 535; p-value < 0.0056) was associated with high, significant population differentiation estimates ( $

Mating system
The estimated outcrossing rate indicated that this species behaved as a mixed-mating system species, with a predominance of outcrossing (Table 3). The multilocus outcrossing rate was significantly less than unity ( $ t m = 0.711; S.E. = 0.061), suggesting selfing and deviation of random mating. The significant difference between the multilocus and single locus outcrossing rates indicated mating among  1-] derived from random matings and were related as half-sibs. The estimated number of effective pollen donors ( / $ ) 1 r p was very low (1.5).
Family outcrossing estimates ( $ t m ) ranged from 0.202 to 0.998 and 19 out of 25 families were significantly different from $ t m = 1.000 (Table 3). Significant mating rates among relatives were observed in 12 families. Correlated matings ranged from 2.3% to 67.3%, and random matings ranged from 17% to 85.9%. Fourteen families had less than four effective pollen donors.

Minimal number of seed-trees
The coancestry coefficient within families ( $ Q xy = 0.171) was higher than expected for half-sibs N e v = 4. The fixation index for the seeds was high and significant ( $ . F s = 0 365). The minimum number of seed-trees needed for seed harvesting in order to retain the reference effective population sizes of 50, 500 and 1000 was 18, 186 and 352, respectively, with n = 100 seeds per tree.

Paternity analysis
The parent pair exclusion probability over eight loci was 0.981. Paternity analysis revealed that of 300 seeds, 82 Genetic structure of Dipteryx alata 132 had a sampled pollen donor within the MS population. Of these 132 seeds, 51 originated by selfing and 81 by outcrossing. The 81 seeds originated by outcrossing had 22 potential pollen donors that dispersed their pollen a mean distance of 610 m (maximum of 1388 m). Dominant pollen donors were present in the 81 seeds originated by outcrossing with seven trees contributing with 62% of the pollen.

Spatial genetic structure
Analysis of the spatial genetic structure showed that populations MG and GO had significant coancestry coefficient estimates ( $ q xy = 0.064 and 0.068, respectively) in the first distance class, indicating SGS up to 196 m ( Figure 2). The SGS analysis showed no significant coancestry coefficient estimates for the MS population ( Figure 2). Based on the SGS values, the new corrected intrapopulation fixation index values ( $ f Ni ) for the MG and GO populations were 0.056 and 0.009, respectively. These findings indicated that the uncorrected $ f i values (MG = 0.117; GO = 0.077) were overestimated by undetected intrapopulation phenomena known as the Wahlund effect.

Discussion
Dipteryx alata genetic structure and mating system The genetic diversity values calculated here indicated great genetic diversity in the populations studied. The overall genetic diversity for all of the populations was similar to that of Dipteryx panamensis analyzed with nine microsatellite loci (H e = 0.520-0.604; Hanson et al., 2008).
As the result of a correction that considers the large amount of private alleles among populations, the $ ¢ G ST value was nearly twice the value of $ q P and $ R ST (Hedrick, 2005). The high divergence among the D. alata populations in this study indicated population structuring, a fact demonstrated by the existence of correlated matings, selfing and limited seed dispersal. The degree of structure among subpopulations of D. alata was very high when compared to that observed in other native tree species of the Brazilian cerrado (Zucchi et al., 2005).
Since we have found evidence of long distance pollen dispersal, we suggest that the main factor structuring the populations is restricted seed dispersal. As SGS is a consequence of different processes, mainly restricted seed dispersal (Dick et al., 2008;Hardy et al., 2006), we suggest that gravity plays a greater role in restricted seed dispersal than the animals, e.g., bats, that usually disperse D. alata fruits.
SGS was responsible for subdividing the MG and GO populations into many circular subpopulations with a radius of 196 m that resulted in elevated $ f i values because of the Wahlund effect. By recognizing SGS within populations, we were able to eliminate the Wahlund effect from the new estimated fixation index values for the MG and GO populations. These findings indicated that besides the possible existence of biparental inbreeding and selfing in the populations, the high $ f i values in the adult populations were also the result of SGS.
The MS population had a high, significant degree of selfing that may impact negatively on heterozygosity and fitness (Lowe et al., 2005). The outcrossing rates determined here were similar to those of congener D. panamensis trees located in pasture ( $ t m = 0.806-0.865; Hanson et al., 2008), which agrees with the fact that the all seed-trees collected in this study are isolated in pasture. We suggest that isolation and a low plant density may have reduced the MS population outcrossing rates because of the low vector effectiveness in pollination (Steffan-Dewenter and Westphal, 2008), as also observed in other pasture trees (Dick et al., 2003). Our results for the pollen dispersal distances also agreed with those for other insect-pollinated tropical trees that occur in low densities (e.g. Dick et al., 2008). The low density of reproductive trees in the studied area resulted in a high number of dominant pollen donors and such dominance has already led to bottlenecks in seeds, thereby reducing their effective size, as demonstrated by the high, significant fixation index( $ . F S = 0 365).

Implications for genetic conservation and breeding programs
Our data support the existence of high genetic diversity within and among populations of D. alata, indicating the possibility of conserving these populations in situ. The high divergence among populations revealed that the three studied populations should be treated as evolutionary significant units (ESUs) and management units (MUs) in order to achieve adaptive evolutionary conservation (AEC) objectives in each population (Palsboll et al., 2007). Moreover, ex situ conservation strategies and restoration projects should be guided by the levels of genetic diversity and local adaptation so as to avoid problems related to maladaptation (McKay et al., 2005).
Two out of three populations showed significant levels of SGS that generally result in a high fixation index. We suggest that in order to mitigate the effects of SGS when establishing a minimum distance between individuals, future studies should sample all individuals of the same ontogenetic stage in a fixed area; this approach will provide a better understanding of SGS and its influence on genetic structure. The presence of SGS suggests that seed harvesting for genetic conservation and breeding programs needs to be done at a minimum distance of 196 m between seed-trees in order to avoid related seed-parents.
The estimated outcrossing rate suggested the presence of a mixed mating system that is already leading to higher levels of inbreeding. We found evidence of long distance gene flow via pollen that may reduce genetic drift in the small fragmented populations of D. alata in the Brazilian cerrado. Nevertheless, the high reproductive dominance indicates that the pollinators are probably not playing their role because of the reduced numbers of reproductive trees and isolation. The low number of effective pollen donors and the expected high number of full-sibs per seed-tree should be taken into account when harvesting seeds for breeding and conservation programs. In this case, to maintain stable population inbreeding levels in ex situ conservation programs, at least 18 seed-trees with 100 seeds each are required in order to retain a reference effective size of 50.