Molecular characterization of ‘sweet’ cassavas (Manihot esculenta) from a germplasm bank in Brazilian Eastern Amazonia

Genetic variability of a set of 81 accessions of ‘sweet’ cassavas (Manihot esculenta) collected mostly in the North region of Brazil was investigated with nine microsatellite loci. All loci were polymorphic, with mean of 6.33 alleles per locus. Analyses indicated that 35 multiloci profiles were represented by a single accession, and 46 showed non-unique profiles, represented by eleven genotypes. Forty-six different multiloci profiles were detected. Most of the putative duplicated accessions were collected in different locations. After the removal of putative duplicated genotypes, genetic parameters were estimated and expected heterozygosity was high (HE=0.73), indicating genetic variability. Structure analysis of this set of ‘sweet’ cassavas divided the 46 genotypes into two clusters (K=2), and a few genotypes had mixed ancestry. Results indicated the habit of exchange of materials among farmers of the North region of Brazil, and the genetic variability to be exploited in genetic breeding efforts.


INTRODUCTION
Cassava (Manihot esculenta Crantz) is usually classified into two categories: 'sweet' and 'bitter'. This division is based on cyanogenic glycosides content on the roots. 'Sweet' cassava has low cyanogenic glycoside content (below 100 ppm fresh weight), while 'bitter' cassava has higher amounts (above 100 ppm fresh weight) (McKey et al. 2010). Roots of bitter cassava must be properly cooked or efficiently processed to detoxify it before consumption. It is normally used to generate starch and to produce flour. 'Sweet' cassavas, known as 'macaxeiras' in the Amazon, are commonly eaten after simple processes, such as boiling and cooking. 'Sweet' cassavas are consumed in many ways: in cakes, soups, fried, boiled, or as substitutes for potatoes.
Since 'sweet' cassavas are used to obtain different products in comparison to 'bitter' type cassavas, farmers tend to select and maintain genotypes according to specific parameters, such as taste and fast cooking. They also harvest 'sweet' cassavas earlier than 'bitter' types, in order to avoid formation of fibers, due to root lignification. Maybe this different type of selection, and the separation that farmers promote in their farms have contributed to generate a slight genetic differentiation detected by molecular markers in South America (Mühlen et al. 2000, Elias et al. 2004, Alves-Pereira et al. 2011, Bradbury et al. 2013), which did not occur in Africa (Bradbury et al. 2013). Due to these differences in usage, 'sweet' and 'bitter' cassava genetic breeding programs are specific for each type (Fukuda et al. 2002, Ceballos et al. 2004).
Due to the great genetic diversity of cassava in Brazil, a portion of this variability is being kept on germplasm banks throughout the country (Fukuda et al. 2002). The knowledge of genetic variability of cassava maintained in germplasm banks is essential for genetic breeding support. Phenotypic variation is the most important to be evaluated, since it is the basis for selection of important agronomic genotypes, and it is used to guide crossings. Besides, the molecular characterization of accessions is complementary for several reasons: to determine the genetic variability, to identify repeated accessions (duplicates), to determine heterotic groups and to organize accessions with more precise identification. The identification of duplicates is interesting to reduce the bank size, which improves its management and reduces costs.
Although there is much information regarding the genetic diversity of cassava, most of them are about samples from Africa (Mkumbira et al. 2003, Moyib et al. 2007, from the Southeast of Brazil (Siqueira et al. 2010, Ribeiro et al. 2011, Costa et al. 2013, or from West of Amazonia (Mühlen et al. 2000, Elias et al. 2004, Siqueira et al. 2009, Alves-Pereira et al. 2011. Pará is the main producer of cassava in Brazil, and its population maintains old habits of cassava use inherited from the indigenous people, such as the use of cassava leaves and starch to produce regional foods. However, there are no studies regarding genetic diversity of cassava sampled in this state. Thus, the aim of this study was to evaluate the genetic diversity of sweet cassavas kept in a germplasm bank of the state of Pará, using microsatellite markers.

MATERIAL AND METHODS
A total of 81 accessions identified as 'sweet' cassava that belong to the cassava germplasm bank of Embrapa Eastern Amazon, in Belém-Pará, Brazil, were used in this study (Table 1). Each accession represents a clonal landrace, and the germplasm bank is maintained by vegetative propagation. The names given by local farmers remained for each landrace. Collections were carried out mostly in the state of Pará (Table 1). Analyses carried out in the institution confirmed the low content of HCN for accessions identified as 'sweet' cassavas. Total genomic DNA was extracted according to procedure similar to the Doyle and Doyle protocol (1990). Leaves were macerated with liquid nitrogen and polyvinylpyrrolidone, and 3 ml CTAB extraction buffer (CTAB 2%, 5 M NaCl, 0.5 M EDTA, PVP, 1 M Tris-HCl and sterilized water) was added to the macerate. Then, it was homogenized and incubated in water bath at 65 o C for one hour. Chlorophorm:isoamylic alcohol (24:1) was added, and then, the extract was homogenized, and samples were centrifuged for 10 minutes at 10,000 rpm. Three ml of 95% ethylic alcohol was added to the supernatant to precipitate the DNA, and samples were centrifuged for 10 minutes at 10,000 rpm. After that phase, the precipitate was washed with 70% ethylic alcohol, for 10 minutes, at 5,000 rpm. DNA samples were resuspended with 300 mL TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) and RNAse. DNA was quantified on 1% agarose gel, using lambda phage DNA at different concentrations (50, 100 and 200 ng mL -1 ) as standard. Then, DNA was diluted to 10 ng µL -1 .
The amplification reactions contained 15 ng of DNA, 100 μM of each dNTP, 0.2 μM of each primer and 1 U Taq polymerase (Invitrogen, São Paulo, Brazil) in 1X PCR buffer (50 mM KCl; 10 mM Tris-HCl, pH 8.8, 0.1% Triton X-100; 1.5 mM MgCl 2 ) in a total volume of 13 μL. Amplifications were carried out using a GeneAmp 9600 thermocycler (Applied Biosystems, Foster City, CA, USA), with the following program: an initial cycle of 94 °C for 4 min; followed by 30 cycles of 94 °C for 30 s; 60 s at appropriate annealing temperature for each primer (Table 2), and extension at 72 °C, for 60 s (Lanaud et al. 1999). Primer-tail approach was used (Oetting et al. 1995). Sequences of the human microsatellites D12S1090, D8S1132 and DYS437 forward primers were used as tails according to Missiaggia and Grattapaglia (2006), with modifications. Primer T3 sequencing was labeled with fluorescent dye PET.
PCRs were carried out independently, and the genotyping, when possible, was multiplexed using an aliquot of 1 μL of each reaction from the same genotype. The total PCR volumes were mixed with Hi-Di formamide (Applied Biosystems) and 0.5 ml of the marker of molecular size Liz-500 (Applied Biosystems), at 8 nM. This allowed the genotyping in multiplexed systems in the DNA sequencer ABI PRISM 3130 (Applied Biosystems, Foster City, CA, USA). Data collection and analysis were carried out using GeneMapper v.4.0 software (Applied Biosystems, USA).
Before analysis of genetic diversity, the genotypes with identical multiloci profiles were identified. The probability of identity, which is the probability that two individuals sampled at random share the same genotype, was measured according to Paetkau and Strobeck (1994), based on allelic frequencies obtained for all accessions. Analyses were carried out using the Identity 1.0 software (Wagner and Sefc 1999).
Statistical analyses were based only on unique genotypes. The following genetic diversity parameters were estimated: total (A) and mean (Ā) number of alleles, number of effective alleles (N E ), observed (H O ) and expected (H E ) heterozygosities, number of unique alleles (UA), and inbreeding coefficient (f), (Weir and Cockerham 1984) for each locus and for total set of sweet cassavas. Analyses were carried out using the GenAlEx 6.4.1 software (Peakall and Smouse 2012). Null alleles for each locus were measured in the Cervus 3.0.3 software (Kalinowski et al. 2007).
EF Moura-Cunha et al. For the construction of the dendrogram, bands were analyzed as presence (1) and absence (0) for the 46 accessions. These data were used to construct a dendrogram based on the Unweighted Pair Group Method with Arithmetic Mean (UPGMA, Sneath and Sokal 1973), using the Jaccard similarity index. A bootstrap resampling method was carried out to determine the robustness of the dendrogram, and 1,000 bootstrap replicates were obtained from the original data of 46 accessions. All calculations were carried out using the Past software (Hammer et al. 2001).
Assignment of genotypes to clusters and relatedness among clusters were assessed with the Structure 2.3.3 software (Pritchard et al. 2000). The non-admixture model and the independent alleles model without prior population information were used. Following a burn-in period of 250,000, 10 independent runs were carried out for each value of K (from 1 to 10), with 250,000 replications. The true value of K (∆K) was chosen according to Evanno et al. (2005).

RESULTS AND DISCUSSION
The nine loci amplified 57 alleles for 81 'sweet' cassava accessions, with a mean of 6.33 alleles per loci (Table 2). All loci were polymorphic. There was variation of three (GA136) to nine alleles (SSRY19). The genotyping revealed that 35 multiloci profiles were represented by a single accession, and 46 showed non-unique profiles distributed in eleven groups of genotypes. Forty-six different multiloci profiles were detected. The most common genotype was detected in nine accessions, while five genotypes were detected in only two accessions (Table 3). Although the number of microsatellite loci did not represent a high portion of cassava genome, the probability of identity, i.e, that these genotypes were identical by chance, was of 9.3 x 10 -9 , which indicates that they were informative. The accessions that presented the same multiloci profile were considered putative duplicates. It is possible that using a larger number of molecular markers, the putative duplicates separate, but they will represent very related genotypes, which can be grouped in the germplasm bank.
Most of these putative duplicated accessions were collected in the Northern region of Brazil, more specifically in the state of Pará, located in the Eastern Amazonia, and many were sampled in different locations (Table 3). As an example, accessions from group 1 were sampled in Moju, Santa Luzia, Belterra and Santarém, in the state of Pará; and in Pedra Branca do Amapari, in the state of Amapá. In this last municipality, the same genotype was sampled three times, with three different names. Collections were mainly carried out in farms, where genotypes are cultivated and propagated. The occurrence of closely related genotypes in distant locations may be an indicative that this genotype is very productive or have a distinct characteristic that is appreciated by farmers. The occurrence of duplicates over distinct locations has been described in Amazonia. Identical genotypes for landraces with different names were identified in a small extension of the Madeira River, in the state of Amazonas, Brazil (Alves-Pereira et al. 2011). After extensive agromorphological characterization, it should be decided if duplicated accessions will be discarded or treated as groups. However, this analysis was useful for the selection of genotypes that are being tested for recommendation in several locations in Pará.
After the removal of accessions with the same multiloci profile, the size of sample was reduced to 46 genotypes, and genetic diversity parameters were estimated (Table 2). Total H E was 0.73, and the variation per locus was 0.82 (SSRY19) to 0.59 (GA136). Total H O was 0.71, with variation of 0.87 (GA126) to 0.48 (SSRY164). The inbreeding coefficient was 0.03, and loci SSRY 164 and SSRY 21 showed high levels. These high heterozygote deficits in the two loci may be due to null alleles detection, as shown in Table 3. Nine unique alleles (the ones that appear in only one genotype), were identified (Table 3). Locus SSRY21 showed the highest number of private alleles (three), and genotypes M. Branquinha 2 and M. Boa Fama had two private alleles each. H E value was high, and the occurrence of private alleles is an indicative of genetic divergence. This value  (Elias et al. 2004, Alves-Pereira et al. 2011) and other regions of Brazil (Siqueira et al. 2010, Costa et al. 2013). The habit of consumption of 'sweet' cassava in the North region of Brazil may have influenced their maintenance by local breeders. Although most of cassava production in Pará is destined for flour, farmers maintain the tradition of cultivating 'sweet' cassava for their personal consumption, in the production of several meals (Cardoso et al. 1999, Carvalho et al. 2010 and for commercialization at a lower scale. The commercialization of 'sweet' cassava in Pará occurs mainly by selling fresh roots in markets and farmer's markets. There are few associations in Pará that commercialize minimally processed 'sweet' cassava roots, and some of them process roots for chips (Carvalho et al. 2010). In the North region, there is little adoption of advanced methods of culture, such as mechanized planting and use of erect cultivars in dense plantations, more commonly used in the Southeast region of Brazil. Moreover, plantations are composed of a mixture of landraces, and generally, farmers from each micro region use their preferred materials, which may be an effect of the high genotype x environment interaction detected for cassava (Farias-Neto et al. 2013). This may have contributed for the high genetic variability detected in the germplasm bank. The hypothesis of high genetic variability maintained in places that adopt few agricultural traits has been mentioned (Emperaire and Peroni 2007). A study on 'sweet' cassava sampled in the State of Paraná, Brazil, where advanced methods of culture are well adopted, showed lower genetic variability (Costa et al. 2013).
Strong genetic relationships could be detected among genotypes based on bootstrap (Figure 1). The number of genetic clusters (K) was calculated as two by the method proposed by Evanno et al. (2005), based on calculations generated by the Structure 2.3.3 software. In general, the dendrogram separated the genotypes into two clusters, as revealed by Structure. Accessions were sampled in farms, and have been kept by farmers, which may have influenced the non-structuring according to geographic region. The high incidence of duplicates is an indicative of exchanges among farmers from different locations, which may have contributed to the lack of structuring of genetic variability according to geographic areas of samples. There was no clustering among genotypes from the same city, or even from the same state. The same was observed by Vieira et al. (2010) and Vieira et al. (2011), who analyzed 'sweet'  (Siqueira et al. 2010, Siqueira et al. 2011, Costa et al. 2013. For all the genotypes, except for 2 of them, the overall membership proportion (Q) was at least 80%, indicating a clear destination of genotypes into two clusters that did not separate according to geographic area (Figure 1). Accessions M. Pão Manaus and group 11 of accessions (Table 3) had Q higher than 20%. Most of the putative duplicated accessions were sampled in different places of Northern Brazil. Thus, clusters generated by Structure could not be associated to geographical area. Further studies are needed to try to correlate clusters with agromorphological characters, and verify if they represent heterotic groups, since the assignment of genotypes to heterotic groups is one of the goals of the use of molecular markers in cassava breeding (Ferguson et al. 2012). However, bootstrap values of some clusters in the dendrogram revealed the most related accessions from the germplasm bank (Figure 1), and crossings among accessions from these clusters must be avoided.
The obtained results showed the existence of genetic variability of a sample of 'sweet' cassavas from the North region of Brazil. Samples were carried out mostly in the Eastern portion of Brazilian Amazonia, and genetic variability of cassavas from this region has not been extensively evaluated. Knowledge about the genetic composition of samples of cassava near its center of domestication is very important, since this knowledge may interest genetic breeders from several parts of the world. The study also showed a high occurrence of putative duplicated accessions in the germplasm bank, as the result of genotypes spread in an extensive area.