MOLECULAR CHARACTERIZATION OF PAPAYA GENOTYPES USING AFLP MARKERS

Due to the low genetic variability reported in the commercial plantations of papaya (Carica papaya L.), the objective of this study was analyze the genetic diversity of 32 genotypes including cultivars, landraces, inbred lines, and improved germplasm using the AFLP technique (Amplified Fragment Length Polymorphism). The genetic distance matrix was obtained using the Nei and Li genetic distance and clustering was performed using the unweighted pair-method with arithmetic mean (UPGMA). Using 11 combinations of EcoRI/MseI primers, 383 polymorphic bands were obtained. On average, 34.8 polymorphic bands were obtained per primer combination. Five clusters were formed. The traditional cultivar ‘Sunrise’ and the inbred line CMF-L30-08 were the closest genotypes, and the improved germplasm (CMF041) and landrace (CMF233) the most distant. The main papaya cultivars commercially grown in Brazil, as well as four inbred lines and three improved germplasm, were clustered together, however, were not grouped in the same branch. The genetic distance between the Sunrise and Golden cultivars was 0.329, and even arising from mutation and selection within the Sunrise variety, the Golden stores considerable genetic variability. Additional variability was observed in the inbred lines derived from papaya breeding program at Embrapa Cassava and Fruits.


INTRODUCTION
Papaya research (Carica papaya L.) is essential in Brazil because the Country is characterized as one of the main producer worldwide.Brazil produces approximately 1.9 million tons with strong participation in the world market (20.9%).The main producers are India, Brazil, Nigeria, Indonesia and Mexico, in that order (FAOSTAT, 2008).
Currently, the most grown papaya varieties belong to the 'Solo' and 'Formosa' commercial groups.The varieties from the 'Solo' group are explored in many regions of the world, producing desirable fruit for exportation, with reddish pulp, small size and weight ranging from 300 to 650 g.Fruits from the 'Formosa' group have reddish pulp and medium size (1000 to 1300 g), and consist of commercial hybrids that have gained space in both domestic and foreign markets, with strong growth in sales to Europe, Canada and the United States.
Only few cultivars of these two groups occupy most of the commercial plantations, hindering papaya development due to limited genetic variability.Furthermore, the plants become more vulnerable to pests and disease outbreaks, which might impair the production of suitable fruits for domestic and foreign markets (OLIVEIRA et al., 2010a).On the other hand, there are several landraces selected by farmers that have been cultivated in many regions of Brazil, and several inbred lines developed by Embrapa Cassava and Fruits (CNPMF), by germplasm hybridization and selection.Despite this effort, few studies have been conducted in order to assess the genetic variability of these genotypes, aiming their effective use in breeding programs and cultivar development.Nevertheless, according to SILVA et al. (2008a), genetic parameters related to morphoagronomic and fruit quality traits in segregating population showed wide genotypic variability.Consequently, genetic gain estimates in commercial fruit yield were obtained (SILVA et al., 2008b).
Morphological and agronomical characterization is usually an affordable way to estimate the genetic diversity and is essential for genotype discrimination and identification of duplicates.The major limitation of morpho-agronomic characterization is that many of the traits used have polygenic inheritance and are influenced by the environment.Besides, the identification of papaya cultivars using morphological traits is not usually possible until fruit production.
In order to overcome this problem, the use of techniques to determine the variation directly at the DNA level has greatly contributed for the assessment of the relationship between genotypes.In the case of papaya, some studies were conducted to characterize genetic diversity using isozymes (MORSHIDI, 1998), Random amplification of polymorphic DNA -RAPD (STILES et al., 1993;SONDUR et al., 1996;JOBIN-DÉCOR et al., 1997), Restriction Fragment Length Polymorphism (RFLP) (ARADHYA et al., 1999), Amplified Fragment Length Polymorphism (AFLP) markers (KIM et al., 2002;VAN DROOGENBROECK et al., 2002;RATCHADAPORN et al., 2007), PCR-RFLP ( VAN DROOGENBROECK et al., 2004), Inter-Simple Sequence Repeats (ISSR) (CARRASCO et al., 2009) and Simple Sequence Repeats (SSR) (OLIVEIRA et al., 2010a(OLIVEIRA et al., , 2010b)).However, most of these studies are concerned with the diversity in germplasm collections without immediate commercial value for cultivar release.
Among the different molecular markers described, the AFLP technique is considered a useful method for the detection of large numbers of polymorphic bands in DNAs from various origins and without prior sequence information, providing extensive sampling of the genome, reproducibility and high number of data point per gel (VOS et al., 1995).Therefore, the main objective of this study was the molecular characterization of cultivars, landraces, improved germplasm and inbred lines from Embrapa Cassava and Fruits, aiming to increase the knowledge and structuring of the genetic variability in commercial and improved genotypes.

Plant material and DNA extraction
Eight improved germplasm from the Active Papaya Germplasm Bank (APGB) at Embrapa Cassava and Fruits, eleven varieties, one Brazilian hybrid, seven inbred lines and five landraces collected in the main producing regions of Bahia, were evaluated (Table 1).Young papaya leaves were harvested and stored at -80 °C for long-term storage.DNA was extracted according to the procedure described by DOYLE & DOYLE (1990).DNA quantification was carried out in agarose gel (1.0% w/v) stained with ethidium bromide (1.0 mg/mL), by comparing the sample intensity of the fluorescence relative to a dilution series of commercial Lambda DNA (Invitrogen, Carlsbad, CA) of known concentration.

AFLP genotyping
The AFLP genotyping was performed using the protocol described by VOS et al. (1995), with some modifications.Briefly, genomic DNA (250 ng) was digested with the combination of a rare (EcoRI) and a frequent (MseI) cutting enzymes.Following digestion, double stranded adaptors were ligated to the ends of the DNA fragments and diluted at a ratio of 1:5, generating template DNA for subsequent PCR amplification (pre-amplification followed by selective amplification step).Afterwards, the DNA was pre-amplified using as selective basis the combination: E+A/M+C, where E = EcoRI adapter, M = MseI adapter, and A = adenine C = cytosine.Then, the pre-amplified DNA was diluted at a ratio of 1:50.
The DNA was amplified using primers with three selective bases to rare and frequent-cutting enzymes [E+ACT / M (M+CAA, M+CAC, CAT, CTA, CTC, CTG and CTT) and E+AAC / M (M+CCA, CAC, CAG, CAT)], allowing 11 primer combinations.After cycling, fragments were electrophoresed on a 6% (w/v) denaturing polyacrylamide gel in a Hoefer SQ3 DNA sequencer gel electrophoresis unit (Pharmacia Biotech Inc., San Francisco, CA) at 70 W for 2.5 h.The gels were stained with silver nitrate, according to Creste et al. (2001).The 50-bp ladder (New England Biolabs, Inc., Beverly, MA) was used as molecular-weight standard to estimate the size of the AFLP loci.

AFLP Data Analysis
Only highly reproducible polymorphic bands were scored as markers: present (1) or absent (0).A genetic similarity matrix (GS) of Nei's genetic distance (NEI & LI, 1979) was computed for all combinations of primers and the 32 genotypes, based on the number of shared amplified bands, according to the estimator , where N XY is the number of peaks shared in accessions X and Y, N X is the number of peaks in accession X, and N Y is the number of peaks in accession Y.The genetic distance estimation was calculated using the TREECON software package ( VAN DE PEER;DE WACHTER, 1994).The genetic distance was computed as GD XY = 1-GS XY .Dendrograms were generated by using the unweighted pair group method with an arithmetic mean (UPGMA), and the robustness of UPGMA trees was evaluated by bootstrapping (1000 bootstrap replicates) using TREECON.Furthermore, AFLP marker diversity was measured using the polymorphism information content (PIC) implemented in the PowerMarker software (LIU;MUSE, 2005).

RESULTS AND DISCUSSION
AFLP polymorphism A total of 558 fragments were scored with an average of 68.63% polymorphic bands per primer combination.Each primer combination generated an average of 34.8 polymorphic bands per individual, ranging in size from 100 to 1380 base pairs (bp).The percentage of polymorphism detected by individual primer combination ranged from 58.21% for E-ACT x M-CTT primer combination to 84.21% for E-ACT x M-CAC primer combination (Table 2).The total number of polymorphic loci amplified within cultivars per primer combination was 251, ranging from 15 (E-ACT x M-CAT) to 30 (E-ACT x M-CTT).The number of polymorphic loci amplified within improved germplasm per primer combination was 192, ranging from 8 (E-ACT x M-CAT) to 23 (E-AAC x M-CAG and E-AAC x M-CAT).In the case of inbred lines and landraces, the total number of polymorphic alleles was 172 and 168, respectively.The primer combinations E-AAC x M-CAT and E-AAC x M-CAG generated higher polymorphism (Table 2).
The use of minimal primer combinations was efficient in achieving highly polymorphic amplified fragments and easily discriminated the tested genotypes.Among the tested primers, E-ACT x M-CTG, E-ACT x M-CTT, E-AAC x M-CAA, E-AAC x M-CAC, E-AAC x M-CAG and E-AAC x M-CAT were important combinations for either germplasm screening, landrace accessions, inbred lines and other studies like genetic mapping and molecular assisted selection.These results are in agreement with others crop such as grapevine (ERGÜL et al., 2010), where 4 single and 20 double AFLP-primer combinations were shown to be successful discriminate 20 rootstocks; and soybean, where it was possible to discriminate each one of the 44 soybean genotypes using five AFLP-primer combinations (SINGH et al., 2010).

Genetic diversity
PIC values ranged from 0.14 (E-AAC x M-CAA, E-AAC x M-CAC and E-AAC x M-CAG) to 0.24 (E-ACT x M-CAA), and the average genetic diversity (mean PIC value) among papaya genotypes for the complete set of AFLP markers was 0.17.A large proportion of the markers presented high discrimination power.Only a few markers showed a very low PIC value due to the fact that the bands that were absent or present in less than three genotypes were not included in the analysis.
A genetic distance matrix based on the Nei and Li method was used to establish the level of relatedness between the cultivars, landraces, inbred lines and improved germplasm.Pair-wise estimates of similarity ranged from 0.328 to 0.942.The traditional cultivar, 'Sunrise', and an inbred line (CMF-L30-08), were the closest genotypes.The highest genetic distance was obtained between improved germplasm (CMF041) and landrace (CMF233).The mean value of genetic distance was 0.735.
Several authors ( VAN DROOGENBROECK et al., 2002;OCAMPO PÉREZ et al., 2007;OLIVEIRA et al., 2010b) reported richness and high diversity of C. papaya resources in several regions, mainly analyzing natural populations and other genera of the Caricaceae family.On the contrary, this kind of information about cultivars and genotypes selected throughout many years of breeding is still scarce.However, the results obtained by KIM et al. (2002) analyzing 71 papaya cultivars, breeding lines, unimproved germplasm, and related species, suggested limited genetic variation in papaya, with an average genetic similarity among 63 papaya accessions of 0.88.Similar results with AFLP analysis in cultivars from Thailand showed a close relationship of the genetic variability (RATCHADAPORN et al., 2007).These authors found high similarity coefficients among thirty papaya cultivars, which could be categorized into six groups, but their similarity coefficients ranged from 0.73 to 0.92.
The AFLP technique was used to assess the genetic relationships among the cultivated papaya and related species native from Ecuador (Vasconcella sp. and Jacaratia sp.) ( VAN DROOGEN-BROECK et al., 2002).Especially within the C. papaya accessions, limited variation was detected due to high level of similarity among these genotypes.This result is in agreement with other studies (STILES et al., 1993;MORSHIDI, 1998).
In general, breeding can reduce the genetic variability when compared to wild accessions (MA-CHON et al., 1996), but a considerable diversity in improved genotypes, using the AFLP technique, was observed.Moreover, it must be emphasized that the genetic diversity detected in our samples still reflects the strong morphological variation observed in the field; particularly for fruit traits and plant type.
The high level of polymorphism in our genotypes are in agreement with the results of OLIVEIRA et al. (2010a, 2010b) who used microsatellites markers, and found a high level of polymorphism in papaya, especially among improved and unimproved germplasm.Although the selection of papaya with desirable agronomic traits by farmers or breeders could reduce the genetic variability, it was observed a broad genetic diversity even when analyzing modern cultivars such as 'Sunrise' and 'Golden'.The degree of variation among these two 'Solo' cultivars was not expected since they have quite similar pedigrees.

Cluster analysis
The pair-wise genetic distances of the accessions was calculated in order to cluster the C. papaya genotypes using the UPGMA algorithm that resulted in five distinct clusters (Figure 1).The first cluster (A) contains only commercial type genotypes, such as the 'Calimosa' hybrid, 'Golden' and 'Sunrise' cultivars, four inbred lines (CMF-L30-08, CMF-L48-08, CMF-L75-08 and CMF-L90-08), and three landraces (CMF233, CMF234 and CMF235) that have been cultivated in different orchards in Brazil.The lowest genetic distance was observed between the CMF-L30-08 inbred line and the 'Sunrise' cultivar (0.328).However, the highest genetic distance was between CMF233 and CMF-L48-08 (0.663).
These results reinforce the hypothesis that the landraces selected within the papaya hybrids and commercial cultivars grown on different farms might have diverged through inbreeding and selection, since each farmer selects and keeps his own seeds.One possible explanation for the high variation is that the original 'Tainung nº'1' hybrid, widely used in Brazil, was selffed, so that the subsequent inbreeding and selection may significantly change its genetic composition, generating the CMF233, CMF234 and CMF235 landraces.
The following two cultivars are interrelated: 'Golden' was obtained by the selection of the 'Sunrise' cultivar, but presents many phenotypic differences such as pulp firmness, yield and yellowish leaf color.The 'Calimosa' hybrid and the 'Golden' cultivar showed a genetic distance of only 0.37, even though the phenotypic contrast is large between these genotypes.However, despite the fact that the parents of the hybrid are unknown, it is known that at least one of them belongs to the 'Solo' group.Therefore, it is possible that the 'Golden' and 'Sunrise' cultivars are part of the pedigree of the hybrids.
Hence, classification by AFLP markers is more reliable than morphological classification where characters are influenced by environmental effects (CERVERA et al., 1998).Historical and/or geographic information, which has been the basis of much variety classification, is limited by the reliability of field information, when research data is lacking.
The second cluster (B) contains two inbred lines (CMF-L12-08 and CMF-L62-08).Cluster C, which contains only improved germplasm (CMF-008, CMF020, CMF012 and CMF018), showed also a high level of genetic diversity.The genetic distance ranged from 0.372 between CMF012 and CMF018 to 0.581 between CMF008 and CMF020.The fourth cluster (D) comprises two improved germplasms (CMF041 and CMF065) with genetic distance of 0.525.The fifth cluster (E) contains two cultivars (CMF123 and CMF128) with genetic distance of 0.572.
The high power of discrimination of AFLP data was able to group other improved germplasm (cluster C and D), two inbred lines (cluster B) and two cultivars (cluster E), but was not able to cluster one inbred line (CMF-L88-08), seven cultivars (CMF154, CMF024, CMF087, CMF021, CMF088, CMF092 and CMF078), two improved germplasm (CMF040 and CMF074) and one landrace (CMF232).These genotypes were not grouped into the same cluster due to high genetic variability based on the Nei and Li method and AFLP polymorphism.Except for the improved germplasm CMF078, all cultivars that were not grouped are not originated nor cultivated in Brazil (Table 1).The major genotypes grown in the Country are present in cluster A, indicating selection pressure for agronomical types such as high yield, orange to reddish flesh and high brix.
These results are contrary to results observed for other species, where the narrow genetic diversity among landraces and modern cultivars of wheat genotypes were found using AFLP (SOLEIMANI et al., 2002;SHOAIB;ARABI, 2006), and even for papaya cultivars ( VAN DROOGENBROECK et al., 2002;RATCHADAPORN et al., 2007).The variability reported in diversity studies is usually proportional to their sample size, and some differences seen here may be attributed to sampling differences and plant material.
The AFLP technique is more efficient, has better resolution for assessing genetic diversity, is inexpensive and requires less labor than the PCR-RFLP technique ( VAN DROOGENBROECK et al., 2004), In addition, the number of AFLP polymorphic bands obtained per primer combination is higher than those reported for RAPD markers (JÓBIN-DÉCOR et al., 1997;STILES et al., 1993;SONDUR et al., 1996).In the present study, a rate of 34.8 AFLP markers/primer enzyme combinations was obtained.The reproducibility of AFLP data makes this approach valuable for the investigation of genetic diversity for any species.
A substantial level of genetic variation still exists within papaya cultivars as detected by AFLP.The results of genetic diversity study provide estimates on the level of genetic variation among diverse materials that can be used in germplasm management, varietal protection, and papaya improvement.
The amount of genetic diversity found with AFLP markers was not sufficient to cluster cultivars, inbred lines, landraces and improved germplasm, but was efficient to distinguish between genotypes for future varietal protection.On the other hand, the estimates of genetic similarity are particularly useful for choosing widely divergent parents with desirable traits for genetic mapping and selection.
MOLECULAR CHARACTERIZATION OF PAPAYA GENOTYPES...  *The core sequences of primers for the selective amplification were as follows: E-= 5'GACTGCGTACCAATTTC-3'for EcoRI primers; M-=5'GATGAGTCCTGAGTAA-3' for MseI primers.Each primer contained 3 selective nucleotides at the 3' end.**Numbers in parentheses refer to the bootstrap standard error.

CONCLUSIONS
The AFLP marker was extremely powerful to detect polymorphism in papaya genotypes, allowing for the estimation of high genetic variability, even among the main cultivars used by farmers in Brazil.From our survey of diversity with AFLP markers, it is possible to conclude that the actions of the papaya breeding program at CNPMF enabled the development of inbred lines with good yield (data not showed) and high genetic diversity of this crop, especially when analyzing the genotypes: CMF-L12-08, CMF-L62-08 and CMF-L88-08, due to strong differentiation in comparison to commercial cultivars, leading to increase in options for cultivars to be used in the production system.

FIGURE 1 -
FIGURE 1 -UPGMA-based dendrogram showing genetic relationships among the 32 papaya genotypes used in this study.The dendrogram was based on the genetic distance calculated according to the Nei and Li coefficient with 1000 bootstrap (it was presented only bootstrap values above 30% to support the consistency at a node).G = improved germoplasm; L = landrace; C = cultivar; H = hybrid and IL = inbred line.

TABLE 2 -
Information generated by 11 primer combinations used to detect AFLP polymorphism among 32 genotypes of C. papaya, and between cultivars, improved germplasm, inbred lines and landraces.