Genetic characterization of cotton landraces found in the Paraíba and Rio Grande do Norte states

The objective of this study was to estimate genetic diversity of cotton mocó planted in Paraíba and Rio Grande do Norte using microsatellite markers, since mocó landraces are a valuable source of genetic diversity. A set of 38 accessions 21 plants from Rio Grande do Norte and 17 from Paraiba was analyzed using 24 pairs of cotton microsatellite primers, which amplified 20 polymorphic loci. The average inbreeding was 0.432, and was slightly higher in individuals from Paraíba than from Rio Grande do Norte. Genetic diversity (Nei ́s unbiased estimator) between individuals from each state’s populations had similar values (HT = 0.327 and 0.302 in Paraíba and Rio Grande do Norte, respectively), indicating that comparable variability has been maintained. Moreover, the proportion of diversity between populations was extremely low (DST = 0.007), but expressive between mesoregions (DST = 0.069). These data led us to conclude that the genetic similarities between populations are high.


INTRODUCTION
Mocó cotton originated from landraces restricted to Northeast Brazil belonging to the species G. hirsutum race marie galante (Giband et al. 2010).The area planted with mocó cotton reached approximately two million hectares during the 1970's and was drastically reduced, initially due to a long series of droughts by the 1979-1983, followed by the hit of boll weevil in 1983 (Menezes et al. 2010).Mocó landraces were planted in the region of Seridó during the end of the nineteenth century and beginning of the twentieth century (Moreira et al. 1994).Local breeding programs started at 1920, and gained importance with the coordination by Superintendência do Desenvolvimento do Nordeste since 1963 (Moreira et al. 1982), and Embrapa cotton since 1975.Landraces continued to be planted (Moreira et al. 1972).Mocó cotton was cultivated as a perennial crop, remaining in the field for around five years, and production of around 400 kilograms per hectare per year (Embrapa 1975(Embrapa , 1997)).Subsistence farming usually associated the cattle pasture after harvesting.Nowadays, mocó is rarely found as a crop, but is frequently found as backyard plant (Menezes et al. 2010).
The most cultivated cotton worldwide is developed of only species Gossypium hirsutum race latifolium.Thus mocó cotton, which is sexually compatible to cultivated cotton (Borém et al. 2003), becomes an important allelic source to expand genetic base upland cotton.Valuable traits as glandless lines (Stipanovic et al. 2005) or resistance to abiotic can possibly be exploited.
Germplasm collections of wild and domesticated cotton in Brazil have been made since 1920 (Campbell et al. 2010).Despite the maintenance of ex situ collections, an increased loss of diversity occurs in situ, caused by changes to economical and cultural habits (Almeida et al. 2009, Menezes et al. 2010), and preservation strategies must be constantly renewed.The knowledge of genetic diversity and structure may nowadays be applied both as a resource to improve genetic diversity of breeding populations and to establish conservation measures for in situ and ex situ preservation (Negri andTiranti 2010, Carvalho et al. 2014).
The analysis of mocó cotton collected in Northeast Brazil by 12 SSR markers showed genetic differentiation of plants collected in Ceará, attributed to a particular breeding program, and plants collected in Piauí, with a marked contribution of G. barbadense, while plants collected in Paraíba and Rio Grande do Norte were very similar (Menezes et al. 2010).The aim of this work was to improve the estimative of the genetic diversity among plants of Gossypium hirsutum L. race marie galante collected in Rio Grande do Norte and Paraíba states using a greater number of SSR markers.

MATERIAL AND METHODS
Mocó plants were collected in 2005 (https://www.cnpa.embrapa.br/albrana/), in ten counties of Paraíba and six counties of Rio Grande do Norte (Menezes et al. 2010) were grouped in mesoregions (Table 1), defined by the Brazilian Institute of Geography and Statistics (IBGE).The leave samples were collected from the plants in situ in 50 mL tubes filled with TE buffer (0.5 M Tris HCl and EDTA pH 8.0).The tubes were transported to the laboratory and frozen at -20 °C until DNA extraction.
Genomic DNA was extracted from leafs, and not from seeds as in the previous study of those plants (Menezes et al. 2010).An amount of 40 to 50 mg of the frozen leaf tissue was used to extract DNA according to the CTAB 2% (cetyltrime thylammonium bromide) protocol.DNA was quantified on 0.8% (w/v) agarose gels by comparison to known quantities of the λ phage DNA (50, 100, 200 and 300 ng), stained with ethidium bromide.It was diluted in TE buffer to a working concentration of 10 ng µL -1 .
Amplification reactions used an initial denaturation step at 94 °C for 12 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing temperature recommended for each primer according to Nguyen et al. (2004) for 1 minute and extension at 72 °C for 1 min.A final extension step of 5 min at 72 °C was added.PCR products were denatured by addition of 10 µL of loading buffer (formamide 95%, 10 mM EDTA, bromophenol blue 0.5% (w/v) and xylene- cyanol), heated for 5 min at 95 °C, chilled on ice and then 5 µL of the denatured PCR products were loaded on a 6% (w/v) polyacrylamide gel containing 7.0 M of urea in 1X TBE buffer.Gels were run for 3 h at 70 watts.Following electrophoresis, the gel was silver-stained.
As SSR are codominant markers, the numbers (1) to (4) were used to designate each allele for locus.Then an allelic matrix was elaborated.The polymorphism information content (PIC) was calculated to each state separately, according to the formula PIC = 1 -∑p 2 ij , where p 2 ij is the frequency of the i allele for the primer j.The correlation of PIC value for each locus between populations was verified using the Pearson's correlation linear test with the program BioEstat 5.0 (Ayres 2009).The intrapopulational genetic variability was inferred by measuring the allele frequencies, the exclusive alleles frequencies, the number of polymorphic alleles per loci (Ap), and the expected (H E ) and observed (H O ) heterozygosity.Wright's fixation index f with confidence interval (95%) was estimated by 10000 bootstrap resembling, using the GDA program (Lewis and Zaykin 2000).
The analysis of the genetic composition was performed considering two genotype classifications: first, accessions collected in each State as a unique sample; and second, the accessions collected in different Mesoregions inside each state as subdivision (Table 1).In that case, the distribution of genetic structure inside and between the populations was measured by the Nei's fixed model (Nei 1973), according to parameters total (H T ), intrapopulational (H S ) and interpopulational (G ST ) genetic diversity, using the program FSTAT (Goudet 2001).The Hedrick's standardized G' ST was employed to correct the G ST values calculated using the program SMOGD (Crawford 2010).

RESULTS AND DISCUSSION
Twenty polymorphic loci were amplified.These loci are distributed over 14 chromosomes from A and D genomes of allotetraploid cotton, and those located both at the same chromosome are sufficiently distant to allow independent segregation (Nguyen et al. 2004).Markers CIR49, CIR94 and CIR381 revealed two loci, which were all polymorphic (Table 2).However, makers CIR09, CIR34, CIR38, CIR105, CIR121, CIR143 and CIR203 were monomorphic in the accessions analyzed.One primer pair amplified one monomorphic allele and three of them did not amplify any allele.
A total number of 50 alleles were obtained, all of them were observed in Paraíba and only 48 among Rio Grande do Norte's genotypes.The number of alleles per locus varied from 2 to 4, in both states (Table 2).The average number of alleles per locus was very similar between Paraíba (2.45) and Rio Grande do Norte (2.42).Four alleles were exclusive for one of the states, revealed by the primer pairs CIR40, CIR49, CIR202 and CIR249 (Table 2), three found in Paraiba only and one exclusive of Rio Grande do Norte.Their frequencies were not high, ranging from 5.8 to 10%.
A sampling effect could cause the presence of these exclusive alleles or, most likely, it could be the result of recent cases of genetic isolation, since seed exchange may be not as common as it was in the past, when extensive cultivation required a remarkable care with diversity and productivity, especially before the development of breeding programs.They have not been previously reported (Menezes et al. 2010) and can be used in studies of differentiations of population and, as verified by Petit et al. (1998), in the choice of preferred genotypes or populations for preservation.A similar number of exclusive alleles was observed in Gossypium barbadense (Almeida et al. 2009), an ancestor species of mocó cotton grown in the Northeastern Brazil.Low oscillations in allelic frequencies were observed between the two states.The most common allele for each locus was the same, except in the cases of CIR97 and CIR381 loci.
The allelic frequency was, in general, very low for some alleles while others tended to values close to 1.00 in the majority of loci (Table 2).The equitable distribution of allele frequencies in a population may indicate higher genetic diversity and that these population present, according to Hedrick (2005), higher protection of effects of genetic drift, when compared to the ones that have different alleles with discrepant frequency.Then, the evaluation of allelic frequencies have great importance, since it can show better the effects of genetic drift, fixation or allelic lose, as well, in genotypes selection.In that way the data obtained are valued as a reference in studies of tracking and multiplication of mocó cotton plant diversity planted in Northeast semi-arid.
The Polymorphism Information Content (PIC) at each locus, estimated by allele frequencies, was greater, in average, among genotypes collected in Paraíba (0.344) than those collected in Rio Grande do Norte (0.293).Considering only Paraíba genotypes, the greater PIC value was obtained for locus CIR381b (0.532), and the smaller values for locus CIR107 and CR185 (0.059); while when considering only Rio Grande do Norte collections, the most informative was CIR 381a (0.556), and the less CIR 251 (0.121) (Table 2).Markers with greater PIC values can be chosen for further studies, since those can be more accurate to detect polymorphism on these or in other populations.It was also noticed that the PIC values were highly correlated on both states (r = 0.80; IC 95% 0.56 -0.92; p<0.0001) (Figure 1).The loci CIR94, CIR246, CIR249 which when availed by Lacape et al. (2007) had PIC values greater than the average value for microsatellite markers, presented with these Brazilian populations of mocó cotton (Table 2) similar values as the loci CIR179 and CIR107, which had in the previous study (Lacape et al. 2007) PIC values smaller than average.The loci CIR246 and CIR249 had greater PIC values then the previously used for mocó cotton populations (Menezes et al. 2010).
The populations of both states are similar -average Nei's genetic distance equal to 0.017 -indicating that comparable diversity has been maintained.Moreover, the genetic differentiation among populations was very low (G' ST = 0.027) and not significant different from zero, according with 95% confidence interval (-0.012 to 0.090).Therefore, mocó plants collected can be considered a single population, congruently to the absence of contrast in allelic frequencies (Tables 2  and 3), and the previous SSR data (Menezes et al. 2010).The values corroborate the high similarity observed for morphological markers (seed type, leaf color, flower spot, presence of linter and color of fiber) between the specimens of the states, such as available at https://www.cnpa.embrapa.br/albrana.Although it is expected that small populations that reproduce by selfing differs greatly among themselves, the small genetic differentiation between the states reflects the veracity of the information obtained in expeditions in situ: the exchange of genetically similar seeds (gene flow), which it leads to homogenization of the genetic diversity among population.
A deficit of heterozygous genotypes was indicated by the observed heterozygosity of 0.178, smaller than the ex- pected heterozygosity (0.315), resulting in a positive and significant total fixation index (F = 0.445, Table 3).This high inbreeding coefficient is probably due to reproduction by selfing (f = 0.432), rather than fragmentation associated to the geographical distribution of the cotton plants in both states (G' ST = 0.027) (Table 3).
The small differences between the states could be explained by a great historic gene flow (N m = 10.7).Although decrease of cotton cultivation for a few decades may have caused a decrease in seed exchange, this has not been enough to cause expressive alteration among genetic composition of states.An amount of seed exchange even with the proposal to plantation in gardens seems to remain.
According to Nei's diversity indices, high total genetic diversity (H T = 0.325) was observed.However, most of this variability was distributed inside the states populations (H S = 0.318), and was extremely low between them (D ST = 0.007).This is 2% of the total diversity of species.
When measured by each mesoregion, the total genetic diversity in each state appears to be very similar, slightly higher in the state of Paraíba (H T = 0.327) then in Rio Grande do Norte (H T = 0.302).The total diversity composition measured among collect mesoregions in each state was expressive, with an average of 0.069.The lowest values were observed among Sertão Paraibano and Central Pontiguar and the highest values among Borborema and Oeste Potiguar (Table 3).It may be concluded that the history of distribution of local varieties, as well as commercial cultivars, which occurred similarly in both locations (Moreira et al. 1989) led to high uniform genetic distribution, which mostly remains.Otherwise, it is possible that the decrease of cotton cultivation, which occurred decades ago, initiated a process of genetic erosion, caused by the effects of genetic bottleneck.
Microsatellite markers help verify the distribution of genetic diversity and therefore guide the search for contrasting genotypes, which should be included for species diversity conservation.As demonstrated in other works on genetic diversity of cotton population, the use of SSR markers has helped to design conservation strategies (Almeida et al. 2009, Barroso et al. 2010).The high diversity rate inside states and low genetic difference between them has showed that the conservation de cotton plants was given priority in one state in detriment of other.But when we observed that the G' ST average value between the plants by mesoregion was expressive, for this variability, we also observed that it not is appropriate to prioritize the conservation of plants from state over those from the others.The conservation in situ may be performed for these cotton plants, in case there is a commitment by the owners of farms.This strategy is hazardous because it depends on environmental conditions and the commitment of the farmers, but Barroso et al. (2010) and Menezes et al. (2014) have observed satisfactory results for nature populations of G. mustelinum.For adequate preservation purposes this germplasm of mocó cotton found in the states was carried out by cotton seed bank located at Embrapa for future studies.

Figure 1 .
Figure 1.Graphical representation of correlation of PIC value for each locus between populations of Paraíba and Rio Grande do Norte states

Table 1 .
Geographic localization and number of collected genotypes at each county, according to mesoregions of the Paraíba (Borborema and Sertão Paraibano) and Rio Grande do Norte States(Central Potiguar and Oeste Potiguar) Crop Breeding and Applied Biotechnology 15: 26-32, 2015IPP Menezes et al.

Table 2 .
Number of alleles per locus (A), estimates of allelic frequencies and polymorphism information content (PIC) for 20 SSR loci in two G. hirsutum r. marie galante populations from the states of Paraiba and Rio Grande do Norte

Table 3 .
Descriptors of genetic diversity.H O, mean observed heterozygosity; H E , mean expected heterozygosity; f fixation index within population; 1 G' ST , genetic differentiation among population, and F, fixation index overall population; 2 G' ST , genetic differentiation among population of accessions by mesoregion