New microsatellite loci for annatto ( Bixa orellana ) , a source of natural dyes from Brazilian Amazonia

Annatto (Bixa orellana) is a tropical crop native to the Americas with Amazonia as the likely center of origin of domestication. Annatto is important because it produces the dye bixin, which is widely used in the pharmaceutical, food, cosmetic and textile industries. A total of 32 microsatellite loci were isolated from a microsatellite-enriched genomic library, of which 12 polymorphic loci were used to characterize four populations of B. orellana and B. orellana var. urucurana, the wild relative. Higher genetic diversity estimates were detected for the wild populations when compared to the cultivated populations. Also, higher apparent outcrossing rates were found for the two wild than the cultivated populations. These results indicate a mixed mating system for the species. All markers described herein have potential to be used in further studies evaluating the genetic diversity, population dynamics, domestication, breeding, and conservation genetics of annatto.


INTRODUCTION
Annatto (Bixa orellana L.; Bixaceae) is a tropical crop native to the Americas with Amazonia as the likely center of origin of domestication (Arce 1999, Clement et al. 2010, Moreira et al. 2015)domestication and dispersal of native Amazonian crops in an expanding archaeological context.Solid molecular data are available for manioc (Manihot esculenta.The wild ancestor of cultivated annatto has recently been identified as B. orellana var.orellana (Willd.)Kuntze ex Pilg.Hence (Moreira et al. 2015).Historical evidence indicates extensive distribution and cultivation of annatto in the American tropics and subsequently its spread worldwide (Leal and Clavijo 2010).Brazil is the major producer of annatto and also hosts the greatest diversity of this species.Annatto is also produced in Peru, Kenya, the Dominican Republic, Colombia, Jamaica, Costa Rica, Suriname, and other countries in Asia (Akshatha et al. 2011).Annatto is commercially valuable due to its applications in the food and cosmetics industries, as a natural substitute of synthetic dyes (Nisar et al. 2015).It is the second most important economic crop worldwide among all natural colorants, and has gained fame for containing, apart from dye, other important substances for human health, e.g., geranylgeraniol, tocotrienols and other carotenoids with antimicrobial, antioxidant and antiviral properties (Albuquerque and Meireles 2012)supercritical CO 2 extraction for defatting of annatto seeds was studied; the objective was to obtain an extract rich in tocotrienols and the defatted rich-bixin seeds.The process conditions were selected from global yield isotherms assays performed at 313 and 333 K, and 20, 31, and 40 MPa; the ratio of solvent mass (S, which can be used to treat human diseases, including leishmaniosis (Lopes et al. 2012).Annatto dye also plays a role in the Brazilian culture, since it is still used by indigenous tribes for body painting and dyeing of clothes (Plotkin 1993).
Microsatellites or simple sequence repeats (SSR) are important tools to assess the genetic diversity and genetic structure of populations.They are widely present in eukaryotic genomes and very useful, mainly because of their codominant inheritance, high polymorphism, high variability and suitability for automated allele sizing and cross-species transferability (Kalia et al. 2011, Vieira et al. 2016)using both low and high throughput genotyping approaches.Motivated by the importance of these sequences over the last decades this review aims to address some theoretical aspects of SSRs, including definition, characterization and biological function.The methodologies for the development of SSR loci, genotyping and their applications as molecular markers are also reviewed.Finally, two data surveys are presented.The first was conducted using the main database of Web of Science, prospecting for articles published over the period from 2010 to 2015, resulting in approximately 930 records.The second survey was focused on papers that aimed at SSR marker development, published in the American Journal of Botany's Primer Notes and Protocols in Plant Sciences (over 2013 up to 2015.Dequigiovanni et al. (2014)while 15 were considered monomorphic.The mean number of alleles per locus was 3.8, ranging from 2 to 6 alleles per locus.Mean values for the observed and expected heterozygosities were 0.541 (ranging from 0 to 0.658 developed 10 polymorphic microsatellite markers for B. orellana L.; however, a larger number of markers can increase the accuracy of estimates of population genetic parameters.Thus, this study presents a new set of microsatellite loci for B. orellana and its wild relative B. orellana var.urucurana, with a view to generate useful information for conservation strategies and population genetic studies.
Genomic DNA extraction from Bixa orellana and B. orellana var.urucurana samples was performed with the CTAB protocol (Doyle and Doyle 1990).A microsatellite-enriched library for B. orellana was developed following Billotte et al. (1999).Genomic DNA was digested with the enzyme AfaI (Integrated DNA Technology-IDT, Coralville, USA) and the fragments resulting from digestion were linked to Afa21 and Afa25 adapters.Fragments were pre-amplified by Polymerase Chain Reaction (PCR) using the Afa21 adapter.Fragment-containing repeats were selected with (CTT) 10 , (GT) 10 and (TA) 10 biotinylated oligos, and recovered with streptavidin-coated magnetic particles (Sigma-Aldrich, St. Louis, USA).Enriched DNA fragments were amplified and cloned using the pGEM-T easy vector (Promega, Madison, USA) and transformed into XL1-BLUE Escherichia coli competent cells (Stratagene, Santa Clara, USA).Ninety-two positive clones were sequenced using universal T7 and SP6 primers with a BigDye v3.1 terminator kit on an ABI 3130XL Genetic Analyzer automated sequencer (Applied Biosystems, Foster City, USA).The sequences containing microsatellite repeats were selected using WebSat (Martins et al. 2009).We considered dinucleotides with more than six repeats, and trinucleotides, tetranucleotides and pentanucleotides with three or more repeats.The primers were designed considering sequences with 50-80% of GC content in PRIMER 3 (Rozen and Skaletsky 2000), with final products ranging from 130 to 350 base pairs (bp) and primer size ranges from 18 to 22 bp.An M13 sequence tail was added to the 5' end of each forward primer following the Schuelke (2000) protocol.

G Dequigiovanni et al.
mM MgCl 2 , 2.5 pmol of forward and M13 label primers (FAM, HEX or NED dyes), and 5 pmol of reverse primers.The PCRs were carried out according to the Schuelke (2000) protocol, consisting of 94 °C (5 min), then 30 cycles at 94 °C (30 s)/Ta °C (45 s)/72 °C (45 s) [Ta= annealing temperature (Table 1)], followed by 8 cycles at 94 °C (30 s)/53 °C (45 s)/72 °C (45 s), and a final extension at 72 °C for 10 min.The quality of amplification was checked by electrophoresis in agarose gels (1.5 %) stained with GelRed (Biotium, Hayward, USA).The PCR products were visualized in an ABI 3130XL sequencer (Applied Biosystems, Foster City, USA) and allele sizes were scored using GeneScan™-500 ROX® Size Standard (Applied Biosystems, Foster City, USA) and analyzed with GENEMAPPER v4.0 software (Applied Biosystems, Foster City, USA).Descriptive statistics and Hardy-Weinberg Equilibrium (HWE) were calculated using diveRsity (Keenan et al. 2013) for R (R Core Team 2015).Genotypic disequilibrium between pairwise loci was estimated using FSTAT (Goudet 2002).Monte Carlo permutations of alleles between individuals and a Bonferroni correction (95%; α=0.05) were used to test if the estimates were significantly different from zero.The software micro-checker 2.2.1 (van Oosterhout et al. 2004) was used to identify possible genotyping errors resulting from stuttering or large allele dropout and the presence of null alleles within the microsatellite data set by performing 1000 randomizations.
The distribution of genetic variation within and among populations was evaluated using "locus-by-locus" AMOVA with GenAlEx version 6.5 (Peakall and Smouse 2012).Wright's F ST was also used to estimate population differentiation and was calculated using GenAlEx.When populations are under Wright's equilibrium, the outcrossing rate is a function of the within-population inbreeding coefficient (Wright 1965).So, the apparent outcrossing rate (t� a ) was calculated for all populations according to Vencovsky (1994), with t� a = (1 -f) / (1 + f).Principal coordinate analysis (PCoA) was used to evaluate the dispersion of accessions with GenAlEx (Peakall and Smouse 2012)enabling population genetic analyses of codominant, haploid and binary data.Allele frequency-based analyses include heterozygosity, F statistics, Nei\u2019s genetic distance, population assignment, probabilities of identity and pairwise relatedness.Distance-based calculations include AMOVA , principal coordinates analysis (PCA.

RESULTS AND DISCUSSION
Thirty-two loci were amplified successfully (Table 1) from 92 positive clones sequenced from the library.Of these 32 loci, 12 were found to be polymorphic in B. orellana and B. orellana var.urucurana populations (Table 2).This level of polymorphism (35%) was also observed in other studies with Bixa orellana.Ten polymorphic in 25 evaluated loci were detected by Dequigiovanni et al. (2014).Software Micro-Checker detected no genotyping errors due to stuttering and large allele dropout.The analyses also showed that loci BorA5_2013, BorB1_2013, BorD1_2013, BorD2_2013, BorG11_2013, and BorH10_2013 might be affected by null alleles in cultivated populations.This excess of homozygosity may be attributable to inbreeding.Therefore, none of the loci were excluded from the analyses.
Polymorphic loci were used to calculate descriptive statistics for each population (Table 2).For the wild B. orellana var.urucurana, the number of alleles per locus varied from 1 to 8. A lower number of alleles per loci was found for cultivated annatto, varying from 1 to 6 (Table 2).The average observed (H O ) and expected heterozygosities (H E ) were also higher in the wild than cultivated populations, with higher H E than H O values observed in both wild and cultivated populations.As a result, local inbreeding coefficients were high in all populations (Table 2).Similar results for cultivated accessions were reported by Dequigiovanni et al. (2014) Higher levels of genetic diversity in wild than cultivated crops were also found in other crops, due to genetic bottleneck effects during domestication, such as in tepary beans (Phaseolus acutifolius) (Blair et al. 2012, Gujaria-Verma et al. 2016), common beans (P. vulgaris) (Bitocchi et al. 2013), apricot (Prunus armeniaca) (Bourguiba et al. 2012), and sunflower (Helianthus annuus) (Mandel et al. 2011).However, this is not always the case, since in some crops, e.g., in carrot (Daucus carota subsp.sativus), no decrease of genetic diversity occurred during domestication (Iorizzo et al. 2013).
Deviation from Hardy-Weinberg equilibrium (HWE) was tested for all loci and populations.Ten loci were found deviating from HWE due to excess heterozygosity for B. orellana and five loci for B. orellana var.urucurana.Deviations from HWE may occur because B. orellana has a mixed mating system and can tolerate both autogamy and allogamy (Rivera-Madrid et al. 2006, Valdez-Ojeda et al. 2010, Joseph et al. 2012)changes in botanical composition could cause variable (unstable.Similarly, Dequigiovanni et al. (2014) found deviations from HWE in eight out of ten loci analyzed.No significant linkage disequilibrium was detected for any pair of loci tested after Bonferroni correction.The apparent outcrossing rates estimated for all populations in this study indicated a mixed mating system for annatto, with much higher outcrossing rates observed for the two wild populations (t� a = 0.644 for Corumbiara; t� a = 0.759 for Ariquemes) than for cultivated annatto (t� a = 0.198 for São Francisco do Guaporé/RO; t� a = 0.355 for Rondón do Pará, PA).Also, it is interesting to mention that the commercial annatto plantation in São Francisco do Guaporé was much more uniform than the one in Rondón do Pará, on a more traditional type of farm, reflected in a lower outcrossing rate of the former.
The AMOVA analysis identified a higher proportion of genetic variation within (68%) than among populations (29%, F ST =0.317, P<0.001), which is still quite high and suggested that subdivision has a great impact on the genetic diversity.However, only 2% of the total variation was attributable to differences between wild and cultivated populations, showing that there must be considerable gene flow between these two types of populations, especially in Rondônia (Figure 1).The F-statistics (F IS =0.366; F ST =0.367; F IT =0.597) also confirmed high levels of genetic structure.The cultivated population of Rondon do Pará was the most divergent, apparently indicating isolation by distance, while considerable gene flow was detected in the two wild and one cultivated populations in Rondônia.
In conclusion, the 12 polymorphic loci reported in this study proved to be powerful tools for assessing genetic diversity, genetic structure, as well as for domestication studies of B. orellana and B. orellana var.urucurana.Higher   levels of genetic diversity and outcrossing rates were found for the wild than the cultivated populations.Also, most of the variation detected by SSR markers was located within populations, which apparently have a mixed mating system.Loci appearing as monomorphic in these populations may be classified as polymorphic in other populations and should therefore not be discarded.

Figure 1 .
Figure 1.Principal coordinate analysis of the dispersion of two cultivated populations of Bixa orellana (Rondon do Pará and São Francisco do Guaporé) and two wild populations of B. orellana var.urucurana (Corumbiara and Ariquemes), using 12 newly developed microsatellite markers.

Table 1 .
Description of 32 Bixa orellana microsatellite loci, including loci names, GenBank accession numbers, annealing temperatures (T a ), repeat motifs and size range of each locusG Dequigiovanni et al.

Table 2 .
Genetic characterization of 12 polymorphic SSR loci in Bixa orellana (cultivated) and B. orellana var.urucurana (wild) populations.Genetic diversity described as number of alleles (A), observed (H O ) and expected (H E ) heterozygosities and inbreeding coefficient (f= 1 -H O /H E )