Discovery and characterization of SSR markers in Eugenia uniflora L . ( Myrtaceae ) using low coverage genome sequencing

Eugenia uniflora L. (Myrtaceae) is a tree species widely distributed in South America suffering the effects of the exploitation of natural populations. In this study, we employed low coverage sequencing of the E. uniflora genome for mining of SSR markers. The de novo assembly generated 2,601 contigs with an average length of 1139 bp and spans 3.15 Mb. A total of 76 dimer, 33 trimer and two compound SSR loci were identified. Twelve selected SSR loci were employed to genotype 30 individuals from two natural populations. A total of 73 alleles were detected (mean A= 6.1) were observed, the mean effective number of alleles was Ae = 3.91, mean HO was 0.23 and mean He was 0.70). The mean Wright’s within population fixation index was FIS = 0.66 and significant deviation of HWE was observed in all loci, except one. The FST between populations equaled 0.27. The levels of genetic diversity and structure estimated with these 12 SSR markers are in accordance with data from genetics studies performed on other tree species of the Pampa biome, presenting moderate to high polymorphism and may be employed in studies of species conservation measures and breeding programs.


INTRODUCTION
Eugenia uniflora l. (2n = 22) is a tree species of the Myrtaceae family, native to the Cerrado, Atlantic Forest and Pampa biomes in Brazil, with economic and ecological importance.It has been employed in the pharmaceutical and cosmetic industries with attested anti-inflammatory functions (Auricchio andBacchi 2003, Costa et al. 2010).Traditionally, the infusion of its leaves is used against gastrointestinal illnesses, while its fruits are consumed fresh and as juice and ice cream (lederman et al. 1992(lederman et al. , Ferreira et al. 1987).This species is also used in the environmental recovery of degraded areas and is an important feed source to bees (Silva and Pinheiro 2007), while its wood is widely used by populations of rural areas for heating residences and manufacturing poles for fencing (Costella et al. 2013).Currently, there are few established orchards to economic use of this species (Almeida et al. 2012).
Based on flow cytometry analysis, E. uniflora has a predicted haploid genome of only 0.251 pg DNA and about 244.99 Mb (Costa et al. 2008).With the advent of the next generation sequencing (NgS) platforms, drafting such small genomes becomes an attractive initiative towards generating genomic information of huge importance for biotechnological exploitation, conservation and breeding of non-model tree species.NgS platforms are quite useful for generating low coverage genome sequencing data.With a relatively reduced cost, this strategy enabled the discovery of novel repetitive elements in barley genome (Wicker et al. 2008), the identification of homolog genes among Dipteran species (Rasmussen and Noor 2009), characterization of the whole plastidial genome of a milkeweed species (Straub et al. 2011) and the discovery of genomic SSR molecular markers (Staton et al. 2015).
In this study we report the discovery of a large set of SSR loci based on low coverage genome sequencing data, and the characterization of 12 genomic SSR markers for E. uniflora.The characterized markers presented moderate to high polymorphism when employed for genotyping adult individuals from two Pampean populations of E. uniflora and will allow accessing genetic diversity of natural populations to better understand population dynamics, to plan reliable conservation measures and to advance breeding programs for this species.

SAMPlINg AND DNA eXTRACTION
Total genomic DNA was isolated from healthy leaves of one single adult plant of Eugenia uniflora l. (Myrtaceae) collected in a natural population within the Pampa biome in southern Brazil (30º20'05.00"S,54º21'44.00"W).A voucher of the collected individual was deposited in the Herbarium Bruno edgar Irgang (HBeI) of the Universidade Federal do Pampa, Campus São gabriel (voucher HBeI1150).Total DNA was isolated with the DNeasy ® Plant mini kit (Qiagen), following the manufacturers' instructions.The quality and the amount of the isolated DNA were evaluated on a NanoVue™ Plus Spectrophotometer (ge Healthcare) and through electrophoresis on 1.0% agarose gel.

NgS SeQUeNCINg AND DE NOVO ASSeMBlY
Total genomic DNA was sheared in fragments of about 300 bp using Biorruptor ® (Thermo Fisher Scientific) and the genomic libraries were built using the IonChef ® (Thermo Fisher Scientific) system following the manufacturers' specifications.DNA fragments were sequenced on Ion 314 TM microchip using the Ion Torrent Personal genome Machine (Thermo Fisher Scientific) and the Ion PgM TM 200 Sequencing Kit following the manufacturers' specifications.After sequencing, the sequence reads were filtered within the PGM software, removing low quality and polyclonal sequences.All PGM filtered data were exported as a Fastq file that was used for the subsequent bioinformatics analysis.
The Fastq filtered sequences obtained from the PgM software were used for a de novo assembly of E. uniflora sequences using SPAdes 3.09 (Bankevich et al. 2012), generating contigs with a minimal size of 1,000 bp.

DISCOVeRY AND CHARACTeRIZATION OF SSR MARKeRS
The software SSRlocator (Maia et al. 2008) was used to find di-and tri-nucleotide repeat motifs in the obtained contigs.The default parameters of SSRlocator were employed to identify SSR loci with a minimum of 6 repetitions.Primers for the identified SSR loci were designed using the Primer3 software (Untergasser et al. 2012) SI).
Potentially amplifiable SSR loci identified were tested for amplification in silico using the software SPCR (Cao et al. 2005).In silico amplification was performed using the contigs obtained from the present E. uniflora sequencing as template DNA (Figure 1).Using this strategy we are able to identify primer pairs that will amplify a single loci within the E. uniflora genome within the expected size range and discard primer pairs generating multi-loci amplifications and unfeasible band patterns.
Twelve loci with dimer and trimer motifs that revealed virtual amplification of a single locus with alleles within the expected size range (Table I) were tested in 30 individuals collected from two natural populations of E. uniflora located into the Pampa biome, Rio grande do Sul State, Brazil.Populations Sg (n = 12) and Al (n = 18) are about 200 km distant from each other and represent two characteristic forest formations that naturally occur in the Brazilian Pampa (Roesch et al. 2009).Population Sg represents a "capão" formation (island of trees within the grassland, Roesch et al. 2009), while population Al characterizes a gallery forest.DNA was isolated from healthy leaves from each sampled plant using the DNeasy ® Plant mini kit (Qiagen), following the manufacturers' instructions.
SSR markers were amplified through PCR in a final volume of 25 μL reaction mix, containing about 30 ng of DNA, 0.25 μM of buffer, 0.5 μM of MgCl 2 , 1U of Taq DNA-Polymerase (Invitrogen ® ), 0.05 μM of each dNTP, 0.125 μM of each primer and 0.2 μg/ μL of BSA.Amplifications were carried out with 95°C for 5 min, annealing temperature ranging from 48ºC to 51.4ºC (see Table I) for 1 min and extension at 72°C for 1 min, for a total of 30 cycles, with a final extension step of 72°C for 20 min.Alleles of each individual were resolved through electrophoresis on 6% polyacrylamide gels.gels were stained with gelRed ® and allele sizing was performed by comparison to a 100 bp ladder.
Total number of alleles (A), effective number of alleles (A e ), observed heterozygosity (H O ), expected heterozygosity (H E ), Wright's within population fixation index [F IS = (H E -H O )/H E ], and deviation from Hardy-Weinberg equilibrium (HWE) were estimated for each locus in each population and overall.Differentiation between populations was estimated using the AMOVA approach (F ST ).All estimations were performed using the software genAlex 6.4 (Peakall andSmouse 2006, 2012).

SeQUeNCINg OUTPUT
The obtained reads from the low coverage genome sequencing yielded around 7.0 Gb of sequences that were used in the de novo assembly.After assembling and exclusion of redundant regions, a total of 2,601 contigs were generated with an average length of 1139 bp (N50 length of 1168 bp) and a cumulative length of 3.15 Mb.

DISCOVeRY OF SSR lOCI
Using the selected parameters, a total of 76 dinucleotide repeats, 33 tri-nucleotide repeats and two compound SSR loci (i.e.di-and tri-nucleotide repeats as SSR motif) were identified within the contigs of the present genome draft.After the in silico test for amplification, 74 out of the 111 loci were considered viable, presenting amplification of a single locus within the expected range and considered as putative informative SSR markers.Repeat motifs, forward and reverse primers, annealing temperature and genBank ID number of the loci are listed in

CHARACTeRIZATION OF SSR MARKeRS
All twelve tested SSR markers were polymorphic in population AL and overall.However, amplification of loci P2, P8 and P13 failed in population Sg (Table II).Overall, a total of 73 alleles, ranging from 3 to 12 (mean A = 6.1) alleles per locus were observed, while the mean effective number of alleles was A e = 3.91, ranging from 2.27 to 8.49 (Table II).estimations of H O ranged from 0.00 to 0.57 (mean H O = 0.23), while H e measures ranged from 0.57 to 0.91 (mean H E = 0.70).The Wright's within population fixation index (F IS ) ranged from 0.34 to 1.00 (mean F IS = 0.66).A significant deviation of HWe (p < 0.05) was observed in all loci, except for Pit13 (Table II).At population level, the number of alleles ranged from three to 11 in population Al and from two to seven in population SG.The effective number of alleles ranged from 2.00 to 8.  II.The AMOVA approach revealed statistically significant (p < 0.001) differentiation between populations, F ST = 0.27 (Table III).

DISCUSSION
The use of low coverage whole genome sequencing has proved to be useful to generate SSR markers An Acad Bras Cienc (2019) 91(1) e20180420 5 | 8 for different hardwood species, although the proportion of identified polymorphic loci with feasible interpretation of the alleles is relatively low (Khodwekar et al. 2015).In this study, 74 out of 111 SSR loci were selected based on their in silico single locus amplification with feasible banding pattern.Using low coverage whole genome sequencing approach for the development of SSR markers, Khodwekar et al. (2015) Nybom (2004) for plant species according to their life traits, the SSR markers we developed for E. uniflora presented overall estimations of H E (mean H E = 0.70, ranging from 0.57 to 0.91) within the range determined for long-lived perennial species (mean H E = 0.68), widespread (mean H E = 0.62), with mixed breeding system (mean H E = 0.60), species of the early successional status (mean H E = 0.46), and ingested seed dispersal (mean H E = 0.73).On the other hand, estimations of H O (mean H O = 0.23, ranging from 0.00 to 0.57) were lower than the summarized data for long-lived perennial species (mean H O = 0.63), widespread (mean H O = 0.57), species of the early successional status (mean H O = 0.39), and ingested seed dispersal species (mean H O = 0.72).Ferreira-Ramos et al. (2008)   Just few investigations about population genetics of tree species growing in the Brazilian Pampa have been reported.These studies reported low levels of genetic diversity and high levels of inbreeding in Pampean populations of Schinus molle (lemos et al. 2015) and Luehea divaricata (Nagel et al. 2015).In addition, Stefenon et al. (2016) showed that this fact may have led to reduction of population fitness in these species.Thus, the comparatively lower estimations of genetic parameters obtained with the 12 SSR markers validated in this study likely are characteristics of the isolated small forest formations found in the Brazilian Pampa and reflects a trend for different tree species.
The SSR markers validated in this study are important tools that can be employed for identifying genetic control of key biotechnological and horticultural traits, for characterizing the genetic diversity and structure of the natural remnants, and will enable the wide application of marker-assisted and genomic selection that may promote the establishment of commercial orchards with improved cultivars of the species.Based on the results of this study, it is reasonable to speculate that we may obtain a large number of informative

Figure 1 -
Figure 1 -Virtual electrophoresis gel from in silico amplification of three SSR loci discovered using low coverage sequencing of the Eugenia uniflora genome and selected for genotyping of 30 individuals of natural populations of the species.MW represents the molecular weight ladder.The arrow indicates the amplified SSR allele.For these three loci, a feasible amplification within the expected size is observed, leading to their selection.
76 (mean A e = 3.43) in population Al and from 1.22 to 5.54 (mean A e = 2.67) in population Sg. estimations of observed heterozygosity ranged from H O = 0.00 to H O = 0.50 (mean H O = 0.22) in population Al and from H O = 0.00 and H E = 0.67 (mean H O = 0.27) in population Sg.The expected heterozigosity ranged from H E = 0.51 to H E = 0.91 (mean H E = 0.67) in population Al and from H E = 0.19 to H E = 0.86 (mean H E = 0.60) in population Sg.The estimations of Wright's within population fixation index in population AL ranged from F IS = 0.33 to F IS = 1.00 (mean F IS = 0.66), while in population Sg, it ranged from F IS = 0.19 to F IS = 1.00 (mean F IS = 0.60).eleven out of the 12 loci presented significant deviation of HWe in population Al.In population Sg, four out of the eight tested loci presented significant deviation of HWe.All estimations overall and for each population are summarized in Table , searching for

Table I (
12 characterized SSR markers) and Table SI (62 not characterized SSR loci).

TABLE I Characterization of 12 SSR markers for Eugenia uniflora including primers sequence (forward and reverse), repeat motif, annealing temperature (T a ), length of the sequenced fragment, and GenBank accession number (GenBank ID).
characterized seven SSR

TABLE II Genetic parameters estimated for Eugenia uniflora based on 12 SSR markers characterized in this study, overall populations and at population level. Estimations include the number of samples (N), number of allele per locus (A), effective allele number (A e ), observed (H O ) and expected (H E ) heterozigosities, Wright's within population fixation index (F IS ), and statistical significance of the deviation from Hardy-Weinberg equilibrium (HWE).
Estimations for Pit2, Pit8 and Pit13 are not presented for population SG because amplification failed in all individual for these loci.Statistical significance: *** = p< 0.001; ** = p< 0.01; * = p< 0.05; ns: not significant.

TABLE III Summary of the analysis of molecular variance (AMOVA) for all populations, based on 12 microsatellite markers.
markers among the 62 SSR loci we discovered for E. uniflora through low coverage whole genome sequencing and did not characterize in this study. molecular