Genetic diversity in Brazilian tall coconut populations by microsatellite markers

– The tall coconut palm was introduced in Brazil in 1553, originating from the island of Cape Verde. The aim of the present study was to evaluate the genetic diversity of ten populations of Brazilian tall coconut by 13 microsatellite markers. Samples were collected from 195 individuals of 10 different populations. A total of 68 alleles were detected, with an average of 5.23 alleles per locus. The mean expected and observed heterozygosity value was 0.459 and 0.443, respectively. The number of alleles per population ranged from 36 to 48, with a mean of 40.9 alleles. We observed the formation of two groups, the first formed by the populations of Baía Formosa, Georgino Avelino and São José do Mipibu, and the second by the populations of Japoatã, Pacatuba and Praia do Forte. These results reveal a high level of genetic diversity in the Brazilian populations .


INTRODUCTION
The coconut palm consists of only one species (Cocos nucifera L.) and is composed of two main varieties: the tall ('Typica') and the dwarf ('Nana') coconut. It is the most widely distributed and extensively grown palm tree, in addition to being one of the most important tropical species used (Persley 1992, Gunn et al. 2011).
Southeast Asia is probably the main center of domestication of the species because the greatest morphological variability occurs in that region, as well as the greatest number of local names, different uses of the plant and the greatest number of associated insect species (Persley 1992). From Southeast Asia, the coconut was taken to India and from there to east Africa. After discovery of the Cape of Good Hope, the plant was taken to West Africa and from that region to the Americas and the entire tropical region of the globe (Purseglove 1975). The tall variety was introduced in Brazil in 1553, coming from the island of Cape Verde.
In the Northeast of Brazil, where most of the Brazilian coconut plantations are concentrated, there are some tall coconut populations that were established more than 80 years ago, and very little is known in regard to their genetic variability. Obtaining this information is fundamental for use in crop breeding programs because these populations may constitute an excellent source of adapted germplasm (Ribeiro et al. 1999).
The tall coconut has been in development in Brazil for more than 450 years. Currently, planted area is estimated at approximately 270,000 hectares, with production of around 2.9 million tons of fruit per year (FAO 2013). The great majority of coconut plantations are in the Northeast of Brazil. The coconut plantations, distributed in this manner, adapted to different environmental conditions and became divergent (Ribeiro et al. 1999) or diverged through independent introductions which came to be characterized as ecotypes of the tall variety. The Northeast stands out as the region with the greatest planted area and it is responsible for more than 80% of domestic coconut production (IBGE 2013).
DNA-based markers have been indicated as most adequate in studies of genetic diversity in coconut (Lebrun et al. 1995, Wadt et al. 1997, while isoenzymes (Benoit and Ghesquière 1984) and leaf polyphenols (Jay et al. 1989) have not provided conclusive results. Studies on diversity performed with microsatellite markers (SSR) have proven to be the most powerful tool in analyses of population structure due to the characteristics of this marker -multiallelic, highly polymorphic and based on the PCR reaction (Chase et al. 1996, Morgante et al. 1996. The aim of this study was to evaluate genetic diversity in ten Brazilian tall coconut populations by means of microsatellite markers (SSR), with a view toward characterization of genetic variability.

Plant material
Selection of populations for this study was made after genetic prospecting of Brazilian tall coconut and definition of pure (typical) populations of the tall variety. Purity is evaluated based on the criteria of legitimacy, homogeneity and isolation. The first criterion is defined as a function of age, the ideal being populations of more than 80 years of age because, as the dwarf variety was introduced in the country in 1925, the risk of the occurrence of natural hybrids between the two varieties is avoided. Homogeneity, for its part, takes the constitution of the populations into consideration -they must be composed of only plants of the tall variety. The isolation conditions of these populations is another important factor -there must be a minimum distance of 1000 meters, or at least 500 meters if there is a plant barrier isolating them from other coconut plantations, especially dwarf plantations (Ribeiro et al. 2002).

DNA extraction
Leaflets from 195 plants were collected in ten Brazilian tall coconut populations, collecting a segment of approximately 50 cm of leaflets from the youngest leaf on each plant chosen in a random manner. DNA was extracted according to the modified CTAB protocol, adapted to coconut (Lebrun et al. 1998, Baudouin andLebrun 2002). The concentration of the extracted DNA was determined by means of automatic quantification by fluorometry.

SSR analyses
A final volume of 25 mL was prepared for the PCR reaction, containing a mixture composed of 2.5 mL of 10X PCR buffer; 2.0 mL of dNTP (2 mM of each dNTP); 0.25 mL of MgCl 2 (50 mM); 0.5 mL of Forward Primer (10 mM); 0.5 mL of Reverse Primer (10 mM); 0.5 mL of Taq DNA Polymerase (2 U mL -1 ); 5 mL of genomic DNA (2.5 ng mL -1 ) and 13.75 mL of sterile water. The PCR reaction cycle consisted of an initial denaturation at 94 °C for five minutes, followed by 36 cycles at temperatures of 94 °C for 30 seconds for denaturation, one minute at 51 °C for annealing of the primer and one minute at 72 °C for extension and, after that, one additional extension period for five minutes at 72 °C.
For amplification of the SSRs, 13 pairs of specific primers designated and selected by Baudouin and Lebrun (2002) were used ( Table 3). The amplified fragments were visualized in polyacrylamide gels by means of the Licor IR2 4200 sequencer. The bands were read and genotyped according to the standard marker for coconut (1 kb).

Statistical analyses
The individuals analyzed were genotyped and the genotype matrix was used for calculations of allele frequencies and genetic distances between individuals and populations and, based on that, the gene (allele) and genotype frequencies at each locus were obtained.
For understanding of the historical-evolutionary aspects of the populations, the Factorial Correspondence Analysis (FCA) technique was used (Perrier et al. 1999). The technique consists of reduction of an n-dimensional hyperspace to a space of few dimensions, in this case a three-dimensional space, making it possible to observe the relative position of the populations and their possible association in groups of similarity. The Genetix 4.03 software (Belkhir et al. 2001) was used for this purpose.
Basic genetic statistics were obtained through the Gene-class2 software . It was also used with the "individual assignment" option, according to the criterion of Rannala and Mountain (1997), to allocate the individuals to the reference populations, and the statistics option was used for the rest of the descriptive statistics. Classification of the individuals according to the reference population is based on the scores of each individual and on the inverse of the base-ten logarithm of likelihood, classifying them according to the probability of belonging to their population of origin.

RESULTS AND DISCUSSION
The allelic diversity measured in the 195 individuals of the ten populations of Brazilian tall coconut is shown in Table 1. The 13 loci in combination produced a total of 68 alleles, with a mean of 5.23 alleles per locus, ranging from two alleles for the locus CnCir E12 to 13 alleles for the FE Ribeiro et al. locus CnCir E2. Expected heterozygosity (He) ranged from 0.034 for the locus CnCir A3 to 0.711 for the locus CnCir E2, with a general mean value of 0.459. The loci CnCir A3 and CnCir E2 also exhibited the lowest value (0.036) and the highest value (0.671), for observed heterozygosity (Ho) respectively, with a mean value of 0.443, considering all the populations. Monomorphic loci were not found for these samples in the group of populations. These results reflect a high level of diversity and reflect that the variation within the populations makes the greatest contribution to total variation. The mean number of alleles per locus was similar to other studies performed on the coconut. Rivera et al. (1999), evaluating 20 accessions of coconut from the Philippines with 38 SSR loci, found a mean of 5.2 alleles per locus, ranging from 2 to 9 alleles per locus, with a total of 198 alleles. Genetic diversity ranged from 0.141 to 0.809, with a mean of 0.547. Perera et al. (2000), for their part, evaluating 130 plants, representing 94 ecotypes from different regions of Sri Lanka, with 8 SSR loci, found a mean of 6.4 alleles per locus, ranging from 3 to 9 alleles per locus, with a total of 50 alleles. Genetic diversity ranged from 0.386 to 0.762, with a mean value of 0.589. The mean value of genetic diversity found in this study was less than that of the authors cited above. That may be related to the common origin of Brazilian coconut. It should be noted that the studies cited above were performed in collections ex situ, and that they, therefore, gather materials from diverse genetic and geographic origins. Konan et al. (2011), evaluating five accessions of dwarf coconut in 25 plants coming from the Philippines, with 12 SSR loci, found a total of 40 alleles, with variation from 2 to 6 alleles per locus and with a mean value of 3.33 alleles per locus. The mean value of expected heterozygosity was The number of individuals sampled per population (N), total number of alleles (A), mean number of alleles per locus (NA), expected heterozygosity (He) and observed heterozygosity (H o ) in the populations are shown in Table  2. Considering all the loci, expected heterozygosity (He) ranged from 0.358 to 0.552 among the populations, with the lowest value found in the population from Georgino Avelino (4) and the greatest value in the population from Japoatã (8), with a general mean of 0.459. Observed heterozygosity ranged from 0.315 in the Georgino Avelino population (4) to 0.553 in the Pacatuba population (9), with a general mean of 0.443.
The estimate of expected heterozygosity in this study was less than that found by Perera et al. (2001) in ex situ collections of tall coconut in Sri Lanka. These authors found values that ranged from 0.426 to 0.846 in the populations, with a mean of 0.682.
For the 13 loci used, the total number of alleles per population ranged from 36 in the Merepe population (7) to 48 in the Japoatã population (8), with an overall mean value of 40.9 alleles per population. It should be noted that the populations that exhibited the greatest numbers of total alleles, 48 (Japoatã), 47 (Pacatuba) and 42 (Praia do Forte), were not the populations with the greatest sample size, which leads us to infer that the results were not affected by the sampling because it was performed at random and the size was sufficient to maintain the representative nature of the population characteristics ( Table 2).
The number of exclusive alleles and the respective frequencies per locus are shown in Table 3. The number of exclusive or private alleles observed ranged from 1 to 3, with a total of 12 private alleles (17.65%) distributed over six of the ten populations evaluated. Although the populations share most of the 68 alleles, there are a small number of alleles that characterize determined populations, with three in the populations of Georgino Avelino, Japoatã and Pacatuba, and one in the populations of Santa Rita, Baía Formosa and Praia do Forte. Four of these alleles may be considered as common localized alleles, i.e., they are  present in a single population but with frequency ≥ 5%. In this case, the strategy of collecting small samples in many populations may be used to capture these alleles, and it is in this category that breeders concentrate their efforts since dispersed common alleles do not present problems but may be captured even in small samples collected from few populations.
Of the 12 exclusive alleles, eight may also be considered as rare localized alleles, i.e., they are found in a single population and with a frequency of less than 5%. A rare sporadic allele was also found, i.e., with a frequency of less than 5%, but present in three of the ten populations, these three populations lying to the south of Natal. The populations of Santo Inácio, Luís Correia, São José do Mipibu and Merepe did not exhibit exclusive alleles (Table 3).
It should be noted that the issue of exclusive alleles may be directly related to the size of the sample and that, therefore, the classification of a determined allele as exclusive is valid for the sample in question since these alleles present in a given population might not be captured by a small sample.
In coconut, a sample composed of 50 to 100 plants per population collected at random was considered adequate for most cases in capturing at least a copy of each allele that occurs with a frequency greater than 0.05 (Marshall and Brown 1975). These authors suggest that rare alleles probably have low adaptive value and are of less interest to breeders. Nevertheless, it would be strategic to collect at least one copy of each allele that occurs with a frequency of less than 0.05.
The number of exclusive alleles found in this study was greater than that found by Perera et al. (2001) who evaluated 330 individuals from 33 populations of tall coconut in the Active Germplasm Bank in Sri Lanka, using eight SSR loci. These authors found four exclusive alleles in two of the 33 populations in a total of 56 alleles, with two of them considered as rare localized alleles and two as rare sporadic alleles.
Classification of the individuals according to the reference population indicates that 128 (66%) of the 195 individuals analyzed were well classified. It was observed that more than half of the individuals that were not classified in their population of origin were found in two groups of populations. The first group was composed of the populations of Baía Formosa, Georgino Avelino and São José do Mipibu, called the Natal group, and the second group was composed of the populations of Japoatã, Pacatuba and Praia do Forte, called the South group.
Considering each one of the two groups as a single population, a new classification of the 195 individuals was made, associating them according to the reference popula-tion (Table 4). According to the results, two groups were formed: the Natal group, composed of the populations of Baía Formosa, Georgino Avelino and São José do Mipibu, and the South group, composed of the populations of Japoatã, Pacatuba and Praia do Forte. After grouping, the number of individuals well classified passed from 128 to 161 (83%). Of the 51 individuals that compose the South group, only six individuals were associated preferentially to the other population, and of the 69 individuals that belong to the Natal group, only nine were associated with other populations.
After classification in groups, it was observed that 83% of the individuals were considered as well classified and are represented with numbers in bold print. These results were used for formation of two groups among the populations. After the grouping, the number of individuals not associated with the population of origin passed from 67 to 34 (Table  4). These results were used as a first criterion for grouping of the populations. Baudouin et al. (2004) evaluated 74 coconut palms from six populations of tall coconut from the Pacific coast and from Latin America, using the same markers. These authors used the same criterion for association of the individuals and found 53% of the corresponding probabilities greater than 95%, and 33% of the individuals were not associated with their population of origin.
The pattern of genetic divergence among the ten populations of Brazilian tall coconut were also evaluated based on Factorial Correspondence Analysis (FCA) and may be visualized in the three-dimensional representation presented in Figure 1. Each axis represents its contribution to total variation and the relative contribution of the individuals accumulates 63% of the total variation. It is also seen that the geographically nearer populations have lower genetic distances.
Based on the FCA results, it is possible to identify a set of populations that correspond to the populations Baía Formosa (3), Georgino Avelino (4) and São José do Mipibu (5). Another set of populations correspond to the populations Japoatã (8), Pacatuba (9) and Praia do Forte (10). The other populations are dispersed in the multidimensional space and do not exhibit any grouping tendency (Figure 1).
Through analysis of the data, the pattern of genetic divergence may be observed among the ten populations and it permits grouping among the populations with greater similarity. Populations 3 (Baía Formosa), 4 (Georgino Avelino) and 5 (São José do Mipibu), were genetically similar among themselves, forming a group differentiated from the other populations. A similar fact is observed with populations 8 (Japoatã), 9 (Pacatuba) and 10 (Praia do Forte), which also present genetic similarity among themselves, forming a second group. The other populations exhibited varied patterns of divergence and, therefore, they did not exhibit a tendency consistent with grouping. Through analysis of the data, it may be observed that the pattern of genetic divergence among the ten populations was similar when compared with the analysis made by the criterion of Rannala and Mountain (1997) and it reinforces the trend of grouping of the populations, providing the same group classification. Similar grouping results were found by Ribeiro et al. (2010) using the neighbor-joining methods, based on the genetic distances and on the populational structure analyses.
It may be concluded from the present study that the microsatellite markers are effective in estimating the level of genetic variation among and within the populations of Brazilian tall coconut, that these Brazilian populations have a high level of genetic diversity and that the variation within the populations makes the greatest contribution to total variation. This suggests that these populations are in a recent process of differentiation, with a common history, and that the structuring of the populations of Brazilian tall coconut was affected by a strong founder effect and by the subsequent selection practiced by humans. Twelve exclusive alleles were found, distributed in six populations, and these exclusive alleles are responsible for the characterization of determined populations. Of the 12 exclusive alleles, four are considered as common localized alleles, one as a rare sporadic allele and eight as rare localized alleles. Classification of the individuals according to the reference population allows grouping the populations into two groups: the first formed by the populations of Baía Formosa, Georgino Avelino and São José do Mipibu, called the Natal group. The second group is composed by the populations of Japoatã, Pacatuba and Praia do Forte, called the South group. The pattern of genetic divergence based on factorial correspondence analysis exhibits a similar result when compared to the classification based on the reference population, and it allows the formation of the same groups of diversity. These results provide an important tool to assist in collection activities, for greater efficiency in conservation of germplasm and to give support to the coconut breeding program in Brazil.