Quantity , organization , and distribution of chloroplast microsatellites in all species of Eucalyptus with available plastome sequence

In this study, we quantify and document the distribution and organization of cpSSRs in the chloroplast genome of 31 Eucalyptus species. Our sample included all previously sequenced plastomes of Eucalyptus species available from the NCBI online database. We processed the complete cpDNA sequences and identified mono-, di-, tri-, tetra-, penta-, and hexanucleotide cpSSRs, with the majority of cpSSRs classified as mononucleotide. After genome microsatellite selection, we evaluated the microsatellites for coding and non-coding regions and cpSSRs were predominantly identified in non-coding regions of cpDNA for all nucleotide types. Pentaand hexanucleotide cpSSRs were the least frequent types of microsatellites. We also developed and virtually amplified 60 primers pairs that can be used in studies of Eucalyptus species. Thus, these cpSSR regions can be used in studies assessing the ecology, breeding, and conservation of the genus.


INTRODUCTION
Eucalyptus L'Hér. is the most dominant genus in Australian flora and with approximately 700 species, it is the largest genus in the Myrtaceae family (Smith et al. 2003).The genus originates from the region in and around Australia, including Timor, Indonesia, Papua New Guinea, the Molucas Islands, Irian Jaya, and the Southern Philippines, between 9° N and 44° S latitude (Eldridge et al. 1993).In 2015, Eucalyptus forest plantations represented about 10% of the approximately 291 million hectares of planted forests worldwide (FAO 2016, Sands 2005).
Because of its extensive use in forestry, plant ecologists, breeders and geneticists have significant interest in establishing methods for fast and robust Eucalyptus identification or species distinction.Currently, the use of markerassisted approaches in Eucalyptus breeding programs is common (Mahajan andGupta 2012, Fuchs et al. 2015).These approaches are powerful tools for phylogenetic and population research (Doorduin et al. 2011) and can be used to assist ecologists and breeders in their analyses (Steane et al. 1998, Freeman et al. 2001, McKinnon et al. 2001, McKinnon et al. 2010, Melotto-Passarin et al. 2011).

MC Andrade et al.
The development of molecular markers for Eucalyptus species improved dramatically after the complete sequencing of its nuclear genome (nDNA) (Myburg et al. 2014, Fuchs et al. 2015).Currently, microsatellite markers (SSRs) are widely used in.Although many nSSRs have been developed for the Eucalyptus genus (Kirst et al. 2005), the use of cpSSRs is still limited.Nevertheless, as with nDNA, cpDNA (i.e., cpSSR) has demonstrated great potential for use in phylogeography, analyses of DNA fingerprints, hybridization and progeny analyses in tree species (Steane et al. 2005, Delgado et al. 2007, McKinnon et al. 2010).Because the majority of cpDNA evolves slowly, its overall utility for evolutionary and population genetic studies can be limited, particularly at lower taxonomic levels (Steane et al. 1991, Steane et al. 1998, Freeman et al. 2001, Shaw et al. 2005).As a result, many researchers have sought to use hypervariable regions of cpDNA to characterize the genetic variation of species (Shaw et al. 2007).In this context, cpSSRs have become a significant resource for genetic analyses (Weising and Gardner 1999) focused on ecology, systematics, conservation, phylogenetic studies, and in breeding programs.As such, chloroplast genomes (cpDNA) can provide important data that can be used in ecological (Latouche-Hallé et al. 2003) and breeding studies of Eucalyptus species, such as estimating the phylogenetic relationships within and between species (Whitock et al. 2003).
In this study, we aim to quantify and document the distribution and organization of cpSSRs in all 31 species of Eucalyptus that have an available plastome sequence.We also developed 60 potential primer pairs that can be used by researchers in a variety of different fields of study.

In silico identification of microsatellite
This study is based on complete plastid genome sequences available from the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?opt=plastid&taxid=2759).From this database, all 31 chloroplast genomes of Eucalyptus spp.were used (Table 1).The complete cpDNA sequences were collected from NCBI and processed using the FastPCR 6.5.40.0 software (Kalendar et al. 2016) to identify the microsatellite regions (cpSSR) as mono-, di-, tri-, tetra-, penta-, or hexanucleotide.We considered only those repeats in which the motifs repeated as follows: mononucleotide repeats with a repeat length ≥ 8; dinucleotide with a repeat length ≥ 6; and tri-, tetra-, penta-, and hexanucleotide with a repeat length ≥ 3.After identification, we assessed microsatellites for coding (gene) and non-coding (intergenic) regions according the information available in the NCBI database for each species.

Primer design and in silico PCR amplification
The complete chloroplast genome of Eucalyptus grandis (GenBank NC_014570) was screened for cpSSRs and 60 cpSSR flanking hypervariable regions were used to develop primer pairs.Primer pairs were designed using Primer3 (v.0.4.0)(Untergasser et al. 2012) based on the following criteria: annealing temperature between 57 and 63 °C; final amplification product of 100 -350bp; GC content of 20 -80%; and primer size of 18 -27bp.
Virtual amplification was performed using the Sequence Manipulation Suite (http://www.bioinformatics.org/sms2/pcr_products.html;Stothard 2000) for all 31 Eucalyptus species.The Suite determines the expected product size for each primer pair.

RESULTS AND DISCUSSION
All microsatellite regions in the cpDNA for the 31 studied Eucalyptus species were evaluated (Table 2).The total number of microsatellites observed for each species varied from 71 for E. melliodora to 135 for E. aromaphloia.
While we identified mono-, tri-, and tetranucleotide microsatellites for all species (Table 2), the majority of Eucalyptus cpSSRs were classified as mononucleotide.Currently there are no other studies that assess the distribution and organization of cpSSRs in Eucalyptus species;; therefore, we provide a general comparison between our results and published studies on other plant species.For example, Vieira et al. (2015) analyzed the occurrence, type, and distribution of cpSSRs for 20 Bambuseae species and their results showed an average total of 141.8 cpSSRs, with a predominance of mono-and dinucleotide repeats.Similarly, Vieira et al. (2016a), studying Retrophyllum piresii Silba C.N. Page, reported that 94.5% of total cpSSRs were mono-and dinucleotides.Furthermore, George et al. (2015), studying the abundance and distribution of SSRs in 164 sequenced plastomes from a wide range of plants, demonstrate that plant species predominantly present mononucleotide repeats, as observed herein for Eucalyptus species.
Chloroplasts are well understood to originate from endosymbiosis between cyanobacteria and higher plants (McFadden 2001, Raven andAllen 2003).The theory of endosymbiosis argues that cyanobacteria, also called blue-green algae, are the ancestors of chloroplasts, despite the fact that cyanobacteria are prokaryotes (Clegg et al. 1994).For the studied Eucalyptus species, most mononucleotide cpSSRs were found in non-coding regions of cpDNA, with the exception of E. nitens and E. marginata which present a more balanced proportion of mononucleotide microsatellites in coding and non-coding regions (Table 2).Gandhi et al. (2010) evaluated cpSSR dynamics in 12 species of the Brassicaceae family using data collected from GenBank.Their results revealed that, although 51% of cpDNA occurs in coding regions, the majority of the total number of SSRs was in non-coding regions that contain twice the number of SSRs than those in coding cpDNA regions.
Herein, dinucleotide microsatellites were found in cpDNA of 15 species, all in non-coding regions (Table 2).Tambarussi et al. (2009), Gandhi et al. (2010) andMelotto-Passarin et al. (2011) observed similar results for Solanaceae, Brassicaceae and Poaceae families, respectively.We also found that penta-and hexanucleotide microsatellites occur predominantly in non-coding regions and these two motifs were the least frequent types of microsatellites for all studied Eucalyptus species.Penta-and hexanucleotide cpSSRs were observed only in E. baxteri and E. diversicolor, respectively (Table 2), all in non-coding regions.Similar to the results presented herein, penta-and hexanucleotide repeats occurred at a low frequency in Bambuseae species, but in coding sequences (Vieira et al. 2015).For cpDNA of R. piresii, Vieira et al. (2016a) found no penta-or hexanucleotide cpSSRs, which is consistent with the results found for most of the studied Eucalyptus species.
Sixty primer pairs were developed and only 20% (12) of them occur in genic regions.All primers virtually amplified products around the expected size and displayed interspecific polymorphism.According to Sumathi and Yasodha (2014), until 2014 only 35 cpSSRs had been identified for different species of Eucalyptus.Therefore, our results indicate a wide range of regions in cpDNA that have yet to be explored (Doorduin et al. 2011).As cpDNA markers are nonrecombinant, uniparentally inherited, and present low rates of mutation in the plastome sequence (Provan et al. 2001, Vieira et al. 2016b), they can be used in phylogenetic analyses and breeding and conservation programs for this genus.
Several studies have indicated that chloroplast genomes and cpSSRs have enormous potential (Zhang et al. 2017) and they are an important tool for plant biologists and breeders in assessing genetic diversity (Doorduin et al. 2011), spatial genetic structure (Islam et al. 2015), evolutionary history, and hybridization in native and improved species (Ebert andPeakall 2009, Doorduin et al. 2011).Thus, our analysis provides a rich database for researchers that can be used to study these Eucalyptus species across a wide range of scientific disciplines.

Table 1 .
Plastome sequence and length for 31 Eucalyptus species

Table 2 .
Frequency (%) of the genic and intergenic cpSSRs based on motif size for each species