Genetic relationship in Coffea species and parentage determination of interspecific hybrids using ISSR ( Inter-Simple Sequence Repeat ) markers

Inter-simple sequence repeat (ISSR) markers were used to evaluate genetic divergence among eight Coffea species and to identify the parentage of six interspecific hybrids. A total of 14 primers which contained different simple sequence repeats (SSR) were used as single primers or combined in pairs and tested for PCR amplifications. Two hundred and thirty highly reproducible fragments were amplified, which were then used to estimate the genetic similarity and to cluster the Coffea species and hybrids. High levels of interspecific genetic variation were revealed. The dinucleotide motif (GA)9T combined with other ditriand tetra-nucleotides produced a greater number of DNA fragments, mostly polymorphics, suggesting a high frequency of the poly GA microsatellite motifs in the Coffea genomes. The genetic similarity ranged from 0.25 between C. racemosa and C. liberica var. dewevrei to 0.86 between C. arabica var. arabica and Hybrid N. 2. The C. arabica species shared most of its markers with five of the six hybrids suggesting that it is the most likely candidate as one of the progenitors of those hybrids. These results revealed that ISSR markers could be efficiently used for genetic differentiation of the Coffea species and to identify the parentage of Coffea interspecific hybrids.


Introduction
The genus Coffea subgenus coffea (Rubiaceae) is represented by about 80 species distributed in Continental Africa and Madagascar island (Fazuoli et al., 2000).Identification of Coffea species is extremely important, not only from the taxonomic standpoint but also for breeding purposes.Wild species are sources of highly advantageous traits, such as resistance to diseases and insects, adaptation to many environmental conditions, and other different aspects of plant characters, for instance, root system, plant architecture, flowers, fruits, leaves, and others.Coffea germplasm collections have been established in many countries such as the Ivory Coast, Madagascar, Costa Rica, Colombia, and Brazil.Nevertheless, a complete Coffea collection is lacking since most of the available collections display only a few species, besides many C. arabica and C. canephora accessions.Additionally, in many of collections, plants were introduced as few seeds, therefore, they do not represent the original species and are actually interspecific hybrids (Fazuoli et al., 2000).While many Coffea species carry important characters for breeding purposes, the best quality coffee is produced from the species C. arabica, which represents around 70% of the world coffee production (Berthaud and Charrier, 1988).Coffea canephora var.robusta (robusta coffee), the other economically important species, responds for about 30% of the coffee produced in the world.The arabica coffee is cultivated in many developing countries representing an important source of income and employment (Anthony et al., 2001).Several C. arabica accessions are available in the established collections.Nevertheless, breeding programs are limited by the very narrow genetic base of C. arabica, particularly when the aim is improvement for pest and disease resistance (van der Vossen, 1985), since most of the known cultivars have derived from the same base population (Carvalho, 1946).
Coffea arabica has an allopolyploid (2n = 4x = 44) origin derived from naturally occurring interspecific hybridization (Carvalho, 1962;Charrier and Berthaud, 1985).In spite of its importance, the origin of C. arabica is still not very clear.Studies based on genomic in situ hybridization (GISH) and fluorescent in situ hybridization (FISH) confirm the allopolyploid nature of C. arabica and stated that C. eugenioides and C. congensis are the diploid progenitors of C. arabica (Raina et al., 1998).Lashermes et al. (1999), on the other hand, suggested that C. arabica arose by hybridization between C. eugenioides, as female parent, and C. canephora or related ecotypes followed by polyploidization.The mode of origin of the polyploidy remains obscure, however, it is likely to involve unreduced gametes (Lashermes et al., 1999).
The knowledge of the genetic diversity and relationship available in Coffea collections is important for planning breeding strategies, identification of interspecific hybrids, and for germplasm conservation.It is also important to find the best choice of individuals to cross in hybrid combinations by optimizing the expression of genes of interest.In addition to morphological traits, the extent of genetic diversity has been studied in cultivated plant species using a variety of chemical and molecular characters.However, the potential to detect polymorphism depends on the method used.The analysis of six isozyme patterns in different C. arabica accessions from Kenya and Ethiopia, revealed absence of polymorphism contrasting with the high level of morphological variation.The results suggest that isozymes are not appropriate for the study of genetic diversity and identification of C. arabica accessions (Orozco-Castillo et al., 1994).The development of restriction fragment length polymorphism (RFLP) analysis demonstrated the viability of studying DNA sequence polymorphisms to investigate the genetic relationship between individuals and to construct linkage maps (Botstein et al., 1980).In Coffea, RFLP of CpDNA was used to detect genetic variation in 25 different taxa and to study the CpDNA inheritance in interspecific hybrids between C. arabica and C. canephora (Lashermes et al., 1996) More recently, detection and exploitation of genetic polymorphism has been improved with the use of PCR based markers.Examples of the most commonly used PCR based techniques include random amplified polymorphic DNA (RAPD) (Williams et al., 1990;Welsh and McClelland, 1990) and amplified fragment length polymorphism (AFLP) (Vos et al., 1995).
Another powerful technique for DNA markers is the PCR amplification of tandemly repeated sequences referred to as Simple Sequence Repeat (SSR) or microsatellite polymorphism.Microsatellites are very polymorphic and widespread in plant genomes (Morgante and Olivieri, 1993).In SSR, the number of repeat units determines the polymorphism for fragment lengths and the heterozigote for different fragments in diploid genomes can usually be distinguished.Individual loci corresponding to a specific primer pair are then co-dominant and can be multi-allelic.The products generated have been found to be highly reproducible (Jones et al., 1997).Zietkiewics et al. (1994) and Kantety et al. (1995) described a marker system named Inter-Simple Sequence Repeat (ISSR) amplification.The ISSR analysis involves the PCR amplification of regions between adjacent, inversely oriented microsatellites using a single simple sequence repeat (SSR)-containing primers.The technique can be applied for any species that contains a sufficient number and distribution of SSR motifs and has the advantage that genomic sequence data is not required (Gupta et al., 1994;Goodwin et al., 1997).The primers used in ISSR can be based on any di-, tri-, tetra, or pentanucleotide SSR motifs found at microsatellite loci, giving a wide array of possible amplification products (Blair et al., 1999).The technique is more reliable than RAPD and generates larger numbers of polymorphisms per primer because variable regions in the genome are targeted (Hantula et al., 1996).The potential use of ISSR markers depends on the variety and frequency of microsatellites, which changes with the species and with the targeted SSR motifs (Morgante and Olivieri, 1993).In addition, the number of bands produced by an ISSR primer with a given microsatellite repeat should reflect the relative frequency of that motif in the genome and would provide an estimate of the motif's abundance as an alternative for library hybridization (Blair et al., 1999).ISSR technique has been used to assess genetic diversity in maize inbreed lines (Kantety, et al., 1995), to investigate the organization, frequency and level of polymorphism of different SSR motifs and fingerprinting in rice (Blair, et al., 1999), and for fingerprinting in potato (McGregor et al., 2000).
The objective of this research was the determination of the abundance and level of polymorphism of different ISSR loci in order to evaluate the genetic diversity of eight Coffea species, to deduce the probable origin of the C. arabica genomes, and to determine parentage of a further six interspecific hybrids.

Plant material
The plant material corresponding to eight Coffea species (one with two varieties) and six interspecific hybrids are listed in Table 1.Species and hybrids were obtained from the Germplasm Collection of the Instituto Agronômico do Paraná (IAPAR), Londrina, PR, Brazil.These Coffea species have been extensively used as a source to transfer important genes to the cultivated species C. arabica and C. canephora.

DNA extraction, and ISSR amplification
Genomic DNA was isolated from fresh leaf tissue following the CTAB method (Doyle and Doyle, 1987), except that CTAB was replaced by MATAB (Mixed Alyltrimethylammonium Bromide, Sigma) in the extraction buffer.The DNA concentration was estimated using a fluorometer (DyNA Quant 200, Höefer-Pharmacia), according to the manufacturer's instructions.DNA samples, obtained from at least five different plants of each species, were adjusted to 10 ng/µL and bulked (Michelmore et al., 1991) for use in PCR amplifications.
A total of 14 primers (Life Technologies) were used for ISSR amplification in the species and hybrid Coffea genomes.The dinucleotide motifs were all 18-mer consisting of (GA) 9 , (TC) 9 , and (AT) 9 repeats.The GA repeat was anchored at the 3' end using C or T selective nucleotide.The trinucleotide repeats were 15-and 18-mer and the tetranucleotide motifs were 16-mer in length (Table 2).For DNA amplification, the nucleotide motifs were used as single primers or combined in pairs with most of these combinations including the poly GA + T motif associated with ditri-and tetranucleotide repeats (Table 2).The reactions were in a final volume of 15 µL containing 1.5 µL buffer 10x (75 mM Tris-/HCl) pH 9.0, 50 mM KCl and 2.0 mM MgCl 2 , 20 mM (NH 4 )SO 4 ; 0.2 mM each of dATP, dTTP, dCTP, and dGTP; 0.5 µM of each primer, 0.8 U of Taq DNA polymerase (Biotools) and 20 ng of template DNA.PCR amplification was carried out using a PTC 200 (MJ Research) thermal cycler programmed with 3 min at 94 °C for initial DNA denaturation, followed by 39 cycles of denaturation at 94 °C for 15 s; annealing at 50 °C for 1 min and 45 s; extension at 72 °C for 2 min.The final cycle was followed by a 7 min extension at 72 °C.The samples were stored at 4 °C until electrophoresis.The ISSR amplification products were resolved in 1.2 % Metaphor agarose (FMC Bioproducts) gels in 1x TAE (40 mM Tris-acetate, 1 mM EDTA pH 8.0) buffer at 120 V for 3 h and stained with ethidium bromide.The ISSR profiles were visualized under UV light, photographed with a video camera, and stored for further analysis.

Data analysis
The ISSR products were scored for the presence (1) and absence (0) of homologous DNA bands.The similarity matrix was based on the Dice coefficient and the dendrogram was created with the UPGMA (unweighted pair-group method using arithmetical averages) method of  the NTSYS-PC (Numerical Taxonomy and Multivariate Analysis System for personal computers) software, Version 2.1 (Rohlf, 2000).The bootstrap method (Felsenstein, 1985) employed to evaluate the reliability of tree topology, was evaluated after 1000 samples.The calculations were performed with the BOOD software, version 3.1 (Coelho, 2001).The mean coefficient of variation (CV) based on the assessment of the errors associated with the estimation of the genetic similarity of the 230 ISSR markers was obtained after 1000 bootstrap samples using the Dboot software, version 1.1 (Coelho, 2001).The matrix of genetic similarity was also used in a principal coordinate analysis (PCOORDA) to resolve the patterns of clustering among the genotypes.The cophenetic coefficient between the matrix of genetic similarity and the dendrogram were computed using the appropriate routine of the NTSYS-pc package.The significance of the cophenetic correlation was tested by the Mantel correspondence test (Mantel, 1967).

ISSR markers
The PCR amplification, performed with ISSR markers to assess the level of polymorphism in eight species (one with two varieties) and six interspecific hybrids of Coffea (Table 1), revealed a high percentage (96.5%) of polymorphic fragments (Table 2).This was not unexpected, since the ISSR technique amplifies microsatellite regions that are potentially polymorphic (Morgante and Olivieri, 1993).The electrophoresis pattern obtained with ISSR markers is illustrated in Figure 1.The matrix of genetic similarity, the resulting dendrogram, and the principal coordinate analysis with the graphic associations among the genotypes, are shown in Table 3 and Figures 2 and 3, respectively.The cophenetic correlation (r = 0.91) indicates the extent to which the clustering of genotypes demonstrated in the dendrogram accurately represents the estimates of genetic similarity among species and between species and hybrids.
According to Sneath and Sokal (1973), the number of traits required to stabilize genotype classification is very important in genetic relationship studies.The bootstrap procedure was applied to estimate the number of bands necessary to obtain a stable classification of all accessions.We observed that approximately 200 markers were sufficient for dendrogram stability and the rate of decrease was comparatively minimal beyond that (Figure 4).These results suggest that 230 markers (CV = 9.8%) are adequate for the analysis of Coffea species and hybrids.
In order for ISSR to be successful, pairs of simple sequence repeats must occur within a short distance (in base pairs) that can be amplified by PCR reaction, producing bands that are resolvable on standard gels (Zietkiewicz et al., 1994).In this study, three dinucleotide, four trinucleotide, and five tetranucleotide motifs amplified a total of 230 ISSR markers, which were consistently generated in all genotypes.From all ISSR primers tested for amplification, a few produced well-defined and reproducible bands when used alone in PCR reactions.However, pair combinations of di-tri-and tetranucleotide motifs rendered a greater number of highly polymorphic bands.Best results of PCR amplification reactions were obtained with the poly (GA) dinucleotide repeat anchored at the 3' end with the T selective nucleotide (Table 2).Using this procedure varying numbers (4 to 20) of polymorphic bands were produced, depending on the SSR motif of the second primer.When used alone, the poly (GA) 9 T primer rendered only six bands.The combination of the poly (GA) 9 T with the (GGAT) 4 motif produced the largest number of bands (20).All these bands were polymorphic.Some significant differences were observed between the number of bands produced by other primer combinations.For example, ISSR amplifications that combined the poly (GA) 9 T with the   1. M is 100bp DNA ladder (Gibco).

Genetic variation among Coffea species and parentage identification
The data matrix of genetic similarities and the dendrogram showing the relationships among the species and hybrids is presented in Table 3 and Figure 2, respectively.The principal coordinate analysis (Figure 3) was Figure 3 -Associations among Coffea species and interspecific hybrids obtained from a principal coordinates analyses using the Dice similarity matrix of 230 ISSR makers.The first, second and the third principal coordinates explain 19.2%, 15.3%, and 13.9% of the total variation, respectively.The numbers correspond to those listed on Table 1.
performed to graphically display the genetic associations among species and hybrids.
The analysis of species relationships showed that ISSR markers placed C. arabica var.arabica closer to C. canephora var.kouillou (0.68) and C. canephora var.robusta (0.66) than to any other species (Figures 2, 3).The association of C. canephora and the cluster containing C. arabica was supported by a moderate bootstrap (BS) value of 64%.Coffea eugenioides was also closed to C. arabica with a genetic similarity coefficient of 0.62 (86% BS).These results sustain the hypothesis that this species is one of the progenitors of the arabica genomes.Although classified in the same subsection, C. eugenioides and C. canephora were placed apart with a very low genetic similarity (0.42 and 0.37), supporting the considerable variation identified by 38 isozyme markers between these two species (Berthou et al., 1980).Coffea racemosa and C. stenophylla (classified in different subsections) showed a similarity coefficient of 0.45 with the cluster supported by a bootstrap value of 91%.
Coffea congensis appears associated in the same group of C. canephora (81% BS), showing similarity coefficients of 0.69 and 0.64 with the robusta and kouillou coffee, respectively.The coefficient of genetic similarity between C. congensis and C. arabica was 0.59.The genetic affinity between C. canephora and C. congensis was demonstrated in previous studies using RFLPs of cpDNA and mtDNA (Berthaud et al. 1980(Berthaud et al. , 1983) ) and the percentage of genetic divergence (4.5%) of ITS2 sequences (Lashermes et al. 1997).While all these results sustain the associations shown for these species with the ISSR markers, the plants referred to as C. congensis used in this study may actually be interspecific hybrids between C. canephora and C. congensis.Comprehensive morphologic analysis of C. congensis plants that were recently introduced to the coffee gene bank of the Instituto Agronômico de Campinas (IAC) revealed noticeable differences between the genotype used in this study and the original C. congensis (Fazuoli et al., 2000).These observations strengthen the associations obtained with ISSR markers between C. canephora and C. congensis supporting that the later could be an interspecific hybrid.
Interestingly, species classified in different subsections, such as C. eugenioides (Erythrocoffea) and C. kapakata (Mozambicoffea), were associated with higher genetic similarity (0.72) and a high bootstrap value (99%), while C. racemosa and C. liberica var.dewevrei, both from the subsection Mozambicoffea, were placed apart with a genetic similarity coefficient of only 0.25 (Table 3).The lowest genetic similarities were observed for C. liberica var.dewevrei with C. stenophylla (0.30) and with C. racemosa (0.25) and for C. racemosa with C. congensis and C. canephora var.robusta (0.30).Similar associations were observed with RAPD markers (Ruas et al., 2000) and with comparisons of ITS2 sequences (Lashermes et al, 1997).The low genetic similarity values identified with the ISSR markers (Table 3) could represent the actual genetic diversity among species and clearly demonstrate the usefulness of the technique in determining the interspecific genetic relationship in Coffea.

Identification of C. arabica genomes
It is widely accepted that C. arabica has been derived from natural hybridization between different Coffea species, therefore, we attempted to use ISSR analysis to deduce the putative parents of this species, given that this knowledge is important for more efficient planning of the strategies of the coffee breeding programs.It is also well established that C. arabica possesses a very narrow genetic base and that it is highly susceptible to various pests and diseases (Wrigley 1995;Paillard et al., 1996).The diploid species closely related to C. arabica, on the other hand, have wide genetic diversity and display important and useful agronomical characters (Dublin et al. 1991;Wrigley 1995).The cluster analysis derived from the ISSR markers showed different coefficients of genetic similarity among the Coffea species (Table 3, Figures 1, 2).The highest genetic similarity of C. arabica var.arabica was with C. canephora var.kouillou (0.68), C. canephora var.robusta (0.66), and C. eugenioides (0.62).These associations are in agreement with previous studies about the origin of the arabica genomes.The main characteristic of the best quality coffee is the superior beverage defined by fine aroma and flavor.Besides C. arabica, C. eugenioides is the only wild species with these characteristics perhaps explaining the origin of the good flavor of arabica coffee (Fazuoli, et al., 2000).Based on RFLP analysis of cpDNA and mtDNA and allozyme data, Berthou and Trouslot (1977) and Berthou et al. (1980Berthou et al. ( , 1983) ) (Raina et al, 1998).However, Lashermes et al (1999) applying the RFLP and GISH clearly suggested that C. arabica is an amphidiploid derived from natural hybridization between C. eugenioides and C. canephora.Studies based on RAPD markers also placed C. arabica close to C. eugenioides, suggesting that this species is likely to be one of the parents of Arabic coffee (Ruas et al, 2000).The coefficients of genetic similarity placed C. canephora and C. eugenioides very close to C. arabica while C. congensis was less related with a similarity coefficient of 0.59, supporting previous studies and emphasizing the utility of ISSR polymorphism to deduce the origin of the arabica genome.

Parentage identification of interspecific hybrids
The ISSR markers were also used to deduce the putative parents of six interspecific hybrids (Table 1, Figures 1,2).These hybrids are important sources of genes for resistance to several coffee diseases.The ISSR markers showed high genetic affinities between C. arabica var.arabica and five interspecific hybrids, forming a group that was supported as a cluster with a bootstrap of 97% (  3, Figures, 1 and 2).Results of RAPD markers were not conclusive to define the origin of N. 1 hybrid (Ruas et al., 2000).The findings of ISSR markers clearly suggest that C. arabica and C. canephora are the best candidates as progenitors of N. 1 hybrid.
N. 2 and N. 5 hybrids were highly related showing a genetic similarity of 0.80 and suggesting a common origin.Coffea arabica shared most of its ISSR markers with N. 2 and N. 5 hybrids displaying high genetic similarity coefficients with these hybrids (0.86 and 0.83) and bootstrap values of 60% and 49%, respectively.Coffea eugenioides was also relatively close to both N. 2 hybrid (0.67) and N. 5 hybrid (0.61).These results are in agreement with previous studies based on RAPD data (Ruas et al., 2000).Whereas N. 5 hybrid showed morphological patterns similar with C. arabica, N. 2 hybrid was morphologically very similar to hybrid N. 5. Thus, molecular and morphological data clearly demonstrated that both hybrids were derived from crossing between C. eugenoides and C. arabica.
N. 4 hybrid, similar to N. 3 hybrid, is also an important source of genes for frost, leaf rust (Hemileia vastatrix), and nematode (Meloidogyne sp) resistance.This hybrid was less related to C. arabica (similarity coefficient of 0.51) and it appeared closer to C. liberica var.dewevrei showing a similarity coefficient of 0.58 and a highly supportive bootstrap value of 90%.Many morphological characters of N. 4 hybrid resemble C. liberica var.dewevrei and C. eugenioides.The associations based on ISSR markers are also consistent with those obtained by RAPD (Ruas et al., 2000) suggesting that C. liberica var.dewevrei and C. eugenioides are the putative parents of the N. 4 hybrid.
N. 6 hybrid is a sterile triploid derived from crossing between tetraploid and diploid parents.The ISSR markers associated N. 6 hybrid to C. arabica with high genetic similarity (0.78), suggesting that they are closely related.Morphological characteristics showed that this hybrid presents narrow leaves and genes for drought, frost and leaf miner (Perileucoptera coffeella) resistance as observed in C. racemosa.The ISSR markers associated N. 6 hybrid to C. racemosa with a similarity coefficient of 0.50.The same conclusion was reached with RAPD data (Ruas et al., 2000).Therefore, it is assumed that N. 6 hybrid derived from crossing between C. arabica and C. racemosa.

Genetic relationship in Coffea species
The results achieved with ISSR markers are in agreement with morphological, molecular, and crossability data suggesting that ISSR analysis could be successfully applied to study the genetic relationship, deducing putative parents of C. arabica, and identifying the species involved in the origin of interspecific hybrids.These results provide an essential basis for a more accurate picture of genetic diversity among Coffea species that may assist hybrid breeding programs.

Figure 2 -
Figure 2 -Unweighted pair group method with arithmetic averages (UPGMA) dendrogram showing the relationships among eight Coffea species and six interspecific hybrids.The diagram was based on the Dice similarity coefficients from 230 ISSR data.Numbers at branches are bootstrap values (%) generated after 1000 replications.

Figure 4 -
Figure 4 -Relationship between the mean coefficient of variation (CV) and sample size (number of bands).The CV value (CV = 9.8%) was obtained after 1000 bootstraps sampling using the Dice similarity coefficient.

Table 1 -
Accessions and collection numbers of Coffea species and interspecific hybrids studied.
a Number in the active germplasm collection at the Instituto Agronômico do Paraná (IAPAR), Londrina, Paraná, Brazil.b Introduced to the IAPAR collection from the Instituto Agronômico de Campinas (IAC).

Table 2 -
Primer sequences with total of amplified ISSR markers and polymorphic index obtained for Coffea species and interspecific hybrids.
b no amplification.
suggested that C. eugenioides, as maternal parent, and C. canephora or cytotypes with related genomes, must be considered as the progenitors of C. arabica.Analysis of genomic in situ hybridization (GISH) showed that the C. eugenioides genomic DNA preferentially hybridized with 22 chromosomes of C. arabica while the other 22 chromosomes hybridized with the C. congensis genomic DNA.Fluorescent in situ hybridization (FISH) using two ribosomal genes provided additional support to the GISH results, confirming the allopolyploid nature of C. arabica and showing that C. congensis and C. eugenioides are the diploid progenitors of C. arabica

Table 1 ,
2).ISSR data and morphological characters suggested that hybrid N. 1 derived from crossing between C. arabica and C. canephora var.robusta.The N. 1 hybrid was associated to C. arabica var.arabica with genetic similarity coefficient of 0.82 (supported by a 73% BS).This hybrid also showed a relatively high genetic similarity to C. canephora var.robusta (0.71) and to C. canephora var.kouillou (0.65), demonstrating that the C. canephora ge-nome is well represented in this hybrid.The others species(C.libericavar.dewevrei, C. stenophylla, and C.  racemosa)showed low genetic similarities with N. 1 hybrid (Table