Genetic variability revealed by microsatellite markers in a germplasm collection of Jatropha curcas L. in Brazil: an important..

We evaluated the genetic variability of a collection of Jatropha curcas germplasm, represented by 93 accessions, using microsatellite markers. Among the 60 markers tested, five of them detected polymorphisms, with a total of 11 alleles and mean of 2.2 alleles per loci. These five markers enabled the quantification of genetic variability through estimates of expected (He=0.42) and observed (Ho=0.64) heterozygosity, Shannon-Weaver index (H’=0.62), coefficient of inbreeding (ƒ=-0.44) and the formation of 11 clusters. Simultaneously, 14 accessions randomly sampled among the 93 and represented by seven plants each, were analyzed with these same five markers to quantify the within and between variability. Most of the genetic variation (92.58%) was contained within the accessions. These analyses revealed, for the first time, expressive genetic variability to be explored in this collection. The accessions UFVJC 05, 07, 12, 18, and 53 presented expressive variability among them with potential for the constitution of a base population for the breeding program.


INTRODUCTION
Oil price volatility, combined with the need to reduce greenhouse gas emissions, has boosted global demand for biofuels. Among the potential species for the production of biofuels, Jatropha curcas L. deserves to be highlighted, as its seeds contain high oil content (36.2%) with the best quality (Freitas et al. 2016). These aspects have led to rapid expansion of cultivated areas and demand for improved cultivars (Sorrel et al. 2010).
Despite its great potential, J. curcas is still a species undergoin domestication, but with considerable variability to be explored. However, information regarding genetic variability and population structure is still limited (Bressan et al. 2012) and breeding programs are rare compared to other oilseed species (Dias et al. 2012, Pecina-Quintero et al. 2014).
The initial phase of a breeding program involving any kind of plant species requires a germplasm collection, functioning as a repository of genes for the future development of varieties. Therefore, the success of an improvement RL Souza et al. program depends on knowledge of the available genetic variability, which will allow the efficient selection of different genotypes to produce hybrids and similar genotypes to produce lines (Pecina-Quintero et al. 2014).
Molecular markers are suitable tools for the characterization of genetic variability in germplasm collections. Microsatellites, also known as Simple Repeated Sequence (SSR) markers, have high reproducibility, codominant inheritance and high polymorphism. Studies involving these markers have been successfully used to characterize the genetic variability of J. curcas (Pamidimarri et al. 2008, Sun et al. 2008, Sudheer et al. 2009, Rosado et al. 2010, Wen et al. 2010, Bressan et al. 2012, Na-ek et al. 2011, Pecina-Quintero et al. 2014, Sinha et al. 2015, Santos et al. 2016, Vásquez-Mayorga et al. 2017, Gangapur et al. 2018. In all of these studies, the genetic variability revealed for this species has been considered low.
The present study aimed to quantify the genetic variability present in a germplasm collection of J. curcas, composed of 93 accessions, using microsatellite markers. This study is part of the strategy to identify superior genotypes for cultivar development.

Plant material
The germplasm collection of J. curcas of the Federal University of Viçosa (UFV), located in the Araponga Experimental Farm (lat 20º 39' S, long 42º 32' W and alt 823 m asl), in the municipality of Araponga, MG, Brazil is composed of accessions originating from different Brazilian geographic regions and from abroad (Table 1), all propagated by seeds. Currently, this collection is composed of 93 accessions (1504 plants) and is installed in modules of five trials in randomized block design, with four replications and 4-plant plots in 2 x 2 m spacing, with two common controls (Freitas et al. 2011, Freitas et al. 2016).
Samples of young and fully developed leaves from the 93 accessions were collected, each with seven sub-samples (plants), totaling 658 plants sampled. The leaves were wrapped in identified aluminum foil and placed in styrofoam boxes with ice for transportation to the Federal University of Viçosa, where they were stored at -80 °C.

DNA extraction
The DNA extraction process was conducted at the Laboratory of Forest Pathology of the UFV, based on the protocol for eucalyptus, modified from Doyle and Doyle (1990). This modification was due to the maceration of the samples in the following the steps: after removing of the central and secondary veins, the leaves were placed in 2 mL microcentrifuge tubes with metal beads and 700 μL extraction buffer (100 mM Tris-HCl pH 8.0, 20 mM EDTA, 1.4 M NaCl, 2% (w/v) CTAB, 2% (w/v) PVP and 0.4% (v/v) β-mercaptoetanol, the latter separated from the other components). The samples were macerated with Tissuelyser II (Qiagen) and incubated in a water bath at 65 °C for 30 minutes. After incubation, 500 μL of chloroform-isoamyl alcohol (24:1) was added to the tubes, which were manually inverted several times. The tubes were then centrifuged at 12,000 rpm for 5 min. The supernatants were transferred to new tubes and the extraction with chloroform was repeated. The supernatants (~ 500 μL) were transferred to fresh tubes and 0.9 volumes (450 μL) of cold isopropanol were added. The precipitated DNA was washed twice with 500 μL of cold 70% and 95% ethanol. The DNA was dried at room temperature for 1 hour and dissolved in 50 μL of TE (10 mM Tris and 1 mM EDTA, pH 8.0) plus RNAse (10 μg mL -1 ) for 2 hours at 37 °C and then stored at -20 °C. DNA quantification was performed using a Thermo Scientific's NanoDrop spectrophotometer and standardized with final concentrations adjusted to 10 ng mL -1 .

Microsatellite molecular markers
Sixty pairs of microsatellite primers were tested (see Table 2). All primers used were previously reported in the literature surrounding J. curcas, some of them drawn based on microsatellite loci derived from genomic sequences and specific ESTs of the species by Bressan et al. (2012), partial genomic sequences developed initially for cassava by Wen et al. (2010), and a number developed by Sudheer Pamidimarri et al. (2009) for differentiation of J. curcas toxic and non-toxic genotypes. PCRs (Polymerase Chain Reaction) were performed in a volume of 20 μL containing 50 ng DNA sample, 1x Taq DNA polymerase buffer, 100 μM of each dNTP, 1.5 mM MgCl 2 , 0.2 μM of each primer and 1.0 U Taq DNA polymerase (Life Science). Amplifications were performed in an MJ Research PTC 100 thermocycler with denaturation at 94 °C for 3 minutes, followed by 40 cycles of denaturation at 94 °C for 1 min, annealing for 1 min at the specific temperature of each primer and extension at 72 °C for 1 min. Final extension was performed at 72 °C for 8 min. PCR products were separated on a 6% denaturing polyacrylamide gel and visualized by silver staining solution, according to Creste et al. (2001).

Variability between and within accessions
To quantify the genetic variability among the 93 accessions, one individual was selected from the seven collected, based on the highest concentration and the best DNA quality. After that, the most polymorphic primers were selected to analyze the variability between and within the accessions. To quantify the genetic variability between and within accessions, DNA from seven plants was extracted from 14 accessions randomly selected from the 93 accessions (Table 2).

Statistical analyses
The markers were coded as codominant, assigning numbers to the alleles. Thus, when a locus presented three alleles, the codes 11, 22, 33 were attributed to the homozygotes and 12, 13, and 23 for the heterozygotes. Popgene software version 1.31 (Yeh et al. 1999) was used to estimate the genetic variability statistics such as allele frequencies, number  Nei's (1972) genetic distance of and the coefficient of inbreeding (ƒ). Analysis of molecular variance (AMOVA, Excoffier et al. 1992) and polymorphism information content (PIC, Botstein et al. 1980) were performed using Genes software (Cruz 2013). For the construction of a circular dendrogram, the Mega7 (Kumar et al. 2016) software was used to perform the UPGMA clustering algorithm from Nei's (1972) genetic distance. Interpretation of the dendrogram was conducted taking into consideration high-change points of cluster fusion.

RESULTS
Among the 60 microsatellite markers tested in the 93 accessions, five were able to detect polymorphisms (Table 2, underlined); therefore, these were used in the analysis of genetic variability.

Genetic variability among the 93 accessions
The locus CESR 0756 was the only one among the five analyzed which allowed the detection of three alleles in the 93 accessions. The locus JCENA 87 enabled the detection of less polymorphism according to allele frequency, although two alleles were detected, similar to the JCDS 10, SSRY 107 and SSRY 127 loci. In JCENA 87, the allele A 1 frequency was almost 100%. The five microsatellite loci used to evaluate the accessions generated a total of 11 alleles (na), with an average of 2.2 alleles per locus. The number of effective alleles (ne) ranged from 1.01 to 2.71, with a mean of 1.91 (Table  3). Because the A 2 allele of locus JCENA87 presented a frequency lower than 0.05 (P < 0.05) it can be considered to be rare, with occurrence in only one individual of the accession UFVJC18.
To quantify the genetic variability between 93 accessions, estimates of expected (He) and observed (Ho) heterozygosity are important. Ho values were found to be ranging from 0.02 to 0.92, with an average of 0.64. He values spanned 0.02 to 0.64, with a mean of 0.42 (Table 3), which indicated a possible heterozygous origin of the collection accessions. For all loci (except JCENA 87), Ho was higher than He, revealing an excess of heterozygotes relative to that expected when in Name  Forward primer  Reverse primer  3  CESR 1041  TTGCTGAAGCCCTTTCTAT  CAGTGTTGAGATCATAGCGA  3  CESR 1042  TTGGATTCCCTATGAACAAC  TTTGTCTGTCGAATCCTCTC  3  CESR 1044  TTGTCGAAGCTAAGGATTTC  CCATTCTTTCTTCCTTTGTG  3  CESR 1050  TTTCCACACATCAGCGGC  ATAAACCTTCAAACGAGCAA  3  CESR 1055  TTTGAGAGGTGGCAATAACT  GTCACAACCGGCAATTAG  3 NS

RL Souza et al.
Hardy-Weinberg equilibrium. For genetic improvement purposes, this result demonstrated the presence of heterozygotes able to be explored in this collection.
A parameter of great importance in assessing genetic variability in populations, by measuring the level of homozygosity, is the coefficient of inbreeding (f). It is essential to verify the existence of crossing among related individuals. Here, f values ranged from -0.005 to -0.85 (Table 3). Notably, negative f values are interpreted as null inbreeding, suggesting that there were no crosses between related individuals in the collection.
The Shannon-Weaver index (H´) presented values ranging from 0.03 to 1.05, and the mean value found was 0.62 (Table 3). This mean value of 0.62 reveals the existence of high genetic variability in the collected accessions of J. curcas, more than sufficient for the continuity of the breeding program.

Genetic variability within and between 14 accessions
The same five polymorphic SSR loci were used on 14 accessions, randomly selected on 93, for evaluating the variability within and between them. A mean of 1.86 alleles per locus was found. The analysis of polymorphism information content (PIC) allowed quantification of the genetic polymorphism of each locus in the accessions evaluated. The highest PIC value observed was 0.35 for the locus SSRY 107, while the lowest value occurred for the locus JCENA 87 which did not distinguish within and between accessions ( Table 3). The mean value of PIC (0.20) indicated a moderate level of polymorphism of the analyzed loci, with the exception of the locus JCENA 87.
Values of He ranged from 0 for the locus JCENA 87 to 0.46 in the locus SSRY 107, with mean of 0.25. Ho ranged from 0 for the locus JCENA 87 to 0.79 at the locus SSRY 107, with an average of 0.42 (Table 3).
The proportion of variability within and between the 14 accessions was evaluated by AMOVA. Most of the genetic variation (92.58%) was within the populations (Table 4). Thus, it can be concluded that there is considerable genetic variability within populations.

DISCUSSION
The characterization of germplasm collections using molecular markers is an important strategy for the success of Table 3. Diversity statistics and frequency of the alleles A 1 , A 2 and A 3 for five polymorphic SSR loci used in the evaluation of genetic variability among 93 accessions of Jatropha curcas L.. The same five loci were also used on 14 accessions, randomly selected among 93 accessions, for evaluating the variability within and between them Number of alleles (na), Number of effective alleles (ne), Observed (Ho) and Expected (He) heterozygosity, Coefficient of inbreeding (ƒ), Shannon-Weaver index (H´) and Polymorphism information content (PIC). breeding programs. Molecular markers have been used in several studies to characterize accessions of J. curcas. Sun et al. (2008) used SSR and AFLP markers to characterize 58 accessions from China, and found a low level of polymorphism. Tatikonda et al. (2009)  The evaluation of genetic variability, by means of microsatellite markers, in 93 accessions from the UFV´s J. curcas collection was performed for the first time. This rich collection presents accessions from several Brazilian regions and abroad and its genetic evaluation is important for both genetic conservation and development of improved cultivars.
Regarding the genetic variability statistics obtained here, the number of alleles per loci (na), ranged from 2 to 3 (mean=2.2), was similar to that found by several other authors. Santos et al. (2016), using 11 SSR markers, found 2 to 5 alleles per locus, whereas Bressan et al. (2012) found 2 to 8 alleles per locus. However, Na-ek et al. (2011) identified only 1.4 alleles per locus, after evaluating 32 accessions. Rosado et al. (2010) found 1 to 2 alleles per locus in accessions in other Brazilian collection, and of the six microsatellite markers selected by them, four were monomorphic. According to Cruz et al. (2011) it is important to have polymorphic loci that have sufficient numbers of alleles to infer the genetic variability of a population in relation to another or its own over time, especially when subjected to evolutionary forces that promote differentiation.
The number of effective alleles (ne) is a measure that quantifies the alleles with significant frequency in a population. In the present study, ne ranged from 1.01 to 2.71. This result was superior to those found by Pecina-Quintero et al. RL Souza et al. (2014) (ne from 1.06 to 1.25) evaluating genetic variability in nine J. curcas populations from Mexico. Wen et al. (2010), in a study of genetic variability of five J. curcas populations, found values for ne ranging from 1.45 to 1.68.
In addition to estimating the number of polymorphic loci, other quantitative measures can be adopted, such as Ho and He, which allow to infer about the genetic structure of the population. The means of Ho and He in the present study (Ho=0.64 and He=0.40) were similar to those found by Santos et al. (2016). Bressan et al. (2012) also reported similar values (Ho=0.53 and He=0.66) for these measures, although with He higher than Ho, in a direction contrary to that verified in the present study and by Santos et al. (2016).
The coefficient of inbreeding or fixation index (f) is a parameter of importance in breeding programs that aim at the development of superior cultivars, as it allows measurement of the level of homozygosity in the population. In our study, the values of f varied from -0.01 to -0.85, being close to expected measures, given the genetic nature of the accessions. Vásquez-Mayorga et al. (2017), in evaluating accessions from Costa Rica, found negative values (-0.10) for f, evidencing that there was no crossing between relatives among accessions. Cruz et al. (2011) stated that negative values of the inbreeding coefficient are common when the Ho values are greater than the expected heterozygosity, suggesting an excess of heterozygous loci. The negative f values should be interpreted as estimates of null inbreeding, that is, there was no crossing among related individuals. J. curcas is a monoic species, so it is expected that the loci are in the heterozygous state due to the mating system by allogamy, as evidenced by Sun et al. (2008), Rosado et al. (2010) and Wen et Al. (2010).
The Shannon-Weaver index (H') has been used in genetic studies as a measure of genetic variability within populations and resembles a genotype richness index. Here, H' values varied from 0.03 to 1.05 (mean of 0.62), which revealed the existence of high genetic variability among our 93 accessions of J. curcas collection. Wen et al. (2010) also used this index to verify genotypic richness in 45 accessions. These authors found an average value of 0.55 using SSR markers and suggested a high level of genetic variability to be explored in five J. curcas populations.
Among the accessions that showed the greatest genetic diversity, as revealed by the UPGMA algorithm (Figure 1), we highlight: UFVJC 05, 07, 12, 18, and 53, collected in João Pinheiro, Montalvânia and Barbacena (MG). The greater variability present in the accessions collected in Minas Gerais ratifies the study presented by Dias et al. (2012) that considered the State of Minas Gerais as a secondary center of genetic variability in J. curcas.
The formation of 11 clusters with the UPGMA algorithm evidences the expressive genetic variability and structuring of the collection. Sun et al. (2008), evaluating 58 accessions using microsatellite markers, found low genetic variability. Rosado et al. (2010) analyzed the genetic variability of 192 accessions by means of RAPD and SSR markers, finding limited variability among the accessions. Na-ek et al. (2011) evaluated 32 plants from different regions of the world with five SSR markers and also recorded low genetic variability. Naresh et al. (2015) assessed genetic variability among 14 accessions from India using RAPD markers and observed considerable variability among accessions.
The collection of seeds led to the formation of our genebank, prioritizing higher number of seeds per accessions, may be the differential of it in terms of expressive varibility. Dias and Kageyama (1991) reported that knowledge of the level of genetic variation and its distribution within and between populations is critical. It is possible to better target the breeding strategies to be adopted, in order to maximize genetic gains through the selection cycles. In J. curcas, some studies have also detected a greater amount of genetic variability within populations (Bhering et al. 2015, Pioto et al. 2015, Sinha et al. 2015. This high concentration of genetic variation within populations implies sampling more plants per population, as recommended by Dias and Kageyama (1991) and Bhering et al. (2015). This strategy was effectively practiced in our collection, when priority was given to collecting more seeds per accession. The present study confirmed the accuracy of our accession collection process, in which each accession was represented by 16 plants. Previous studies employing molecular markers evaluated collections with a reduced number of plants per accession. This is possibly the main reason why they have systematically revealed low genetic variability in the species.
The accessions UFVJC 05, 07, 12, 18 and 53 exhibited expressive diversity, being able to comprise a base population for breeding.
Our results demonstrated the existence of molecular genetic variability within and between and accessions, indicating that our germplasm collection can be used as a base for breeding program.