1 Crop Breeding and Applied Biotechnology - 20(1): e27812013, 2020 Microsatellite DNA fingerprinting of Coffea sp. germplasm conserved in Costa Rica through singleplex and multiplex PCR

: A large collection of coffee genetic resources is conserved in Costa Rica. In this study, microsatellite DNA fingerprinting of coffee through singleplex and multiplex PCR approaches coupled with capillary electrophoresis are described. To validate both methods, germplasm of Coffea spp. (Arabica and non-Arabica) and intraspecific F1 hybrids were analyzed using fourteen microsatellite markers. It was observed that through both PCR methods the fingerprinting profile of a subset of samples was identical. The genetic analyses revealed that non-Arabica coffee displayed greater genetic variation than Arabica coffee did. In addition, microsatellite analyses allowed the separation of C. arabica from other species using the principal coordinate analysis (PCoA) approach. The neighbor-joining tree clustering analysis revealed either a grouping of wild genotypes separated from cultivars of C. arabica, or a relation of intraspecific F1 hybrids with parental lines. The utility of our methodology for the characterization of F1 hybrids not previously analyzed through SSR (Simple Sequence Repeats) fingerprinting is demonstrated. (Typica), Bourbon (Bourbon, Caturra, Villa Sarchí, Venecia, and SL-28), Bourbon-Typica (Catuaí), and as introgressed germplasm (T5175, IAPAR59). Some genotypes analyzed in this study are derived from the Timor Hybrid (Catimor, Catigua MG2, T5175, Oeiras, MG6851, IAPAR 59, Tupi RN, and T5296), which grouped interspersed in groups A and B, and the genotype T5296 in group D. Our results indicate that samples derived from the Timor Hybrid share alleles with C. arabica from wild accessions and cultivars. The allelic diversity detected over the 14 microsatellites might indicate that alleles from the original hybrid have spread through the genotypes derived from the Timor Hybrid, which showed notable diversity and interspersed grouping (Setotaw et al. 2010). The differentiation between Coffea species, and between the wild and domesticated germplasm in our study demonstrate that the selected set of microsatellites seems useful for genetic characterization of coffee germplasm. The DNA fingerprinting protocol for coffee germplasm analysis described in this study was used to produce genetic information of F1 intraspecific hybrids that had not been previously genotyped; consequently, we also enrich the existing knowledge of coffee genetic diversity. F1 intraspecific hybrids analyzed in this study are the result of crosses between a group of wild genotypes (Sudan-Ethiopian) and commercial varieties (Bertrand et al. 2006). Hybrids included in this study were vegetatively propagated and tested in Central America, demonstrating a yield earlier than and superior to traditional cultivars, greater stability across varied environments and high beverage quality provided by an intrinsic cup quality of ET parents (Bertrand et al. 2006, Bertrand et al. 2011). The NJ tree obtained from F1 intraspecific hybrids displayed four groups (Figure 1C). In group A, two subgroups were distinguished of which the parental lines ET-25 and Rume Sudan 1 are grouped with hybrids L04A5 (one parent is ET-25) and L13A44 (one parent is Rume Sudan). Another subgroup contained the parental line E-531 that was used to generate L13A22 and L5A26. The genotype Caturra is clustered in group B with hybrids L09A22, L11A26, and L4A34, which are derived from Caturra. Despite being derived from Catuaí and ET-41, L04A42 was grouped in B because it shares alleles from ET-41 that are also present in L11A26 and L4A34. Group C comprises hybrids L02A30 and L13A12, which derive from Ethiopian accessions (ET-15 and ET-06, respectively). Despite ET-41 (grouped in C) not being a parental line of L02A30 and L13A12, it might share alleles with ET-15 and ET-06, as previously described for the clustering of these three Ethiopian accessions (Anthony et al. 2001). Within


INTRODUCTION
Of the Rubiaceae family, two species are mainly responsible for the world coffee production: Coffea arabica L. and Coffea canephora Pierre ex A. Froehner, commonly known as Arabica and Robusta coffee, respectively. C. canephora is a diploid species (2n = 2x = 22) and C. arabica is a tetraploid species (2n = 4x = 44) derived from the hybridization of C. canephora and C. eugenioides (Lashermes et al. 1999). C. arabica is indigenous to Ethiopia and was first cultivated in Yemen in the seventeenth century. Until the middle of the twentieth century, cultivated coffee grown in Latin America shared the same genetic base of Yemen plants. Coffee plants introduced to Latin America are believed to be C. arabica var. 'Typica' and C. arabica. var. 'Bourbon' (Anthony et al. 2002). In Costa Rica, coffee was introduced at the end of the 18th century; the first seeds are believed to have come from 'Typica' from Martinique Island (ICAFE 2015). These events of introduction and domestication, as well as the allotetraploid origin, reproductive behavior, and evolution have contributed to a narrow genetic base of coffee, as demonstrated with molecular marker analyses (Lashermes et al. 1999, Anthony et al. 2001, Zhou et al. 2016, Sousa et al. 2017).
Elodia Sánchez et al. In the period between October 2017 and June 2019, 62-65 % of the global coffee production by exporting countries came from C. arabica (International Coffee Organization 2017). More than half of the production of coffee in the world is supplied by Latin America, with Brazil and Colombia being the major producers. Costa Rica is responsible for about 1% of the world coffee production (International Coffee Organization 2017). Breeding efforts in Costa Rica have hence been developed through national and international participation of the Costa Rican Coffee Institute (ICAFE), the Tropical Agricultural Research and Higher Education Center (CATIE), the Regional Cooperative Program for the Technological Development and Modernization of Coffee Cultivation (PROMECAFE), and the French Agricultural Research Centre for International Development (CIRAD). Some of this work was undertaken to obtain F1 intraspecific hybrids derived from crosses between wild Sudan-Ethiopian accessions of C. arabica (ET6, ET15, ET25, E41, E416, E531, Anfilo, and Rume Sudan) and American cultivars of C. arabica (Caturra, Catuaí, T5296). The hybrids tested in Central America showed stability, high yields and satisfactory beverage quality (Bertrand et al. 2006, Bertrand et al. 2011. Conventional breeding efforts have included strategies such as hybridization, selection by interspecific crossings and backcrossing to transfer resistance to biotic stresses, and improve adaptation and the yield (Silva et al. 2019, Shigueoka et al 2014. The lack of reproductive precision, the differences in ploidy levels between C. arabica and other diploid species, and their incompatibility are limitations associated with conventional coffee breeding. Another limitation that hinders breeding programs is the selection of genetically diverse parental lines for hybridization and the identification of hybrids at an early stage (e.g. seedlings), based on morphological characteristics (Mishra and Slater 2012). A proper identification of germplasm for breeding and conservation purposes requires the development of tools that can accelerate and provide reliability in the characterization of germplasm, to increase the efficiency of coffee breeding programs (Hendre et al. 2008). In this context, several molecular markers have been employed for coffee evaluation; they include RFLP (Lashermes et al. 1999), RAPD (Anthony et al. 2001), AFLP (Anthony et al. 2002), simple sequence repeats (SSR) (Missio et al. 2011, Ogutu et al. 2016, and single-nucleotide polymorphism approaches (SNP) (Zhou et al. 2016). Molecular markers have also served for linkage mapping and quantitative trait loci (QTL) analyses (Moncada et al. 2016), and association studies of SNP with caffeine content (Tran et al. 2018). Despite the fact that next-generation sequencing (NGS) technologies lead to an increasing use of SNP markers for several of the applications mentioned above, microsatellites are still a viable option (Flanagan and Jones 2019).
Microsatellites exhibit several advantages that make them suitable for the characterization of plant genetic resources, including coffee species. Some advantages are high polymorphism, reproducibility, relatively easy scoring, and codominance. Microsatellite-based analyses are also amenable for automation through the use of genetic analyzers with capillary electrophoresis (CE) coupled with laser-induced DNA fluorescence, which allows the combination of disparate microsatellites (Butler et al. 2004). Microsatellites are prone to true multiplexing that decreases the duration and cost of microsatellite genotyping (Guichoux et al. 2011). Regarding the use of multiplex PCR for microsatellite analysis in coffee, to the best of our knowledge, only Aerts et al. (2013) used this approach for population genetics studies in C. arabica; however, the method was not described in detail. Therefore, the aim of the present study was to provide a detailed procedure for microsatellite fingerprinting through both singleplex and multiplex PCR coupled with capillary electrophoresis. This analysis can be alternatively or concomitantly applied with other marker systems to characterize the genetic diversity of coffee germplasm and its genetic relationships, including F1 intraspecific hybrids that were not previously reported.

Screening and selection of microsatellites
Twenty-seven microsatellites were initially chosen, taking into account the cross-species amplification in Coffea sp. and the highest polymorphic information content (PIC) as described previously (Combes et al. 2000, Baruah et al. 2003, Poncet et al. 2004, Aggarwal et al. 2007, Hendre et al. 2008, Missio et al. 2009, Missio et al. 2010, Sousa et al. 2017). PCR amplification was tested over the DNA of 14 genotypes that were distributed in three DNA pools (Table 1). PCR reaction mixes (final volume 25 mL) contained DNA (~100 ng) and 1× DreamTaq PCR Master Mix (Thermo Scientific, Delaware, USA), with MgCl 2 (2 mM) and primers (0.2 μM each). The amplification was performed in a thermal cycler (Veriti®, Applied Biosystems, California, USA) with an initial denaturation step at 95 °C for 5 min, followed by 35 cycles at 95 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min, and a final extension at 72 °C for 8 min. PCR products were separated by polyacrylamide gel electrophoresis (10% denatured, DCode TM , Biorad, California, USA) at constant voltage (200 V) for 6 h. Fourteen of the 27 microsatellites tested showed polymorphisms between the three DNA pools, and were selected for further singleplex PCR, multiplex PCR, and capillary electrophoresis.

Data analysis
After capillary electrophoresis, allele binning and genotyping was manually inspected with the GeneMapper V4.0 software (Applied Biosystems, California, USA). Using the same software, the molecular profile of 12 samples analyzed through multiplex PCR was compared with the profile generated from singleplex PCR amplification. We employed the tetraploid data set of microsatellites in the ATETRA software (Van Puyvelde et al. 2010) to estimate the following genetic diversity parameters: number of alleles, expected heterozygosity (He), expected heterozygosity corrected for sample size (He (c) ), Shannon-Wiener diversity index (H`), and genetic differentiation coefficients (G st and R st ). The polymorphic information content (PIC) and the pairwise genetic distance matrix were calculated with the R package Polysat 1.7 (Clark and Jasieniuk 2011). The distance matrix was used in a principal coordinate analysis (PCoA) to visualize the grouping among the studied species. The same matrix was used in a neighbor-joining analysis to construct a tree with only C. arabica (cultivars and wild genotypes); a second tree was built with F1 hybrids and parental lines.

Genetic diversity analyses comparing singleplex and multiplex PCR
The molecular profile generated using the multiplex PCR protocol was the same as that generated with the singleplex PCR protocol. The approach followed in our study was similar to that previously proposed (Hill et al. 2009), which relies on a core set of markers that amplify without major optimization; other primers are then added. To obtain the identical profiles with both PCR approaches, extensive optimization steps were performed for primer concentrations of the microsatellites as well as for the ratio of the pseudo-multiplex PCR products for capillary electrophoresis ( Table 2). As suggested by Sutton et al. (2011), the fluorescent dye was invariant for each SSR in both singleplex and multiplex PCR assays, which also aided obtaining identical profiles.
The total numbers of alleles (NA) amplified in the Arabica and non-Arabica groups were 56 and 49, while the average numbers of alleles were 4.00 and 3.50, respectively (Table 3). The average number of alleles per SSR was greater in the Arabica group than previously reported, while similar or smaller for the non-Arabica group (Combes et al. 2000, Moncada and McCouch 2004, Aggarwal et al. 2007, Hendre et al. 2008, Sousa et al. 2017). Both the total number of alleles and the average number of alleles per SSR were influenced by the number of samples, their genetic background, the number of microsatellites employed, and the polymorphism level of each microsatellite. In order to evaluate the robustness of our methodology, we included F1 intraspecific hybrids that had not been previously included in microsatellite studies, which, added to cultivars and wild genotypes, contributed to an increased average number of alleles per SSR in the Arabica group.
The average values of PIC, He, He (c) , and H' index were higher in the non-Arabica group (Table 3). Similar to the study of Baruah et al. (2003), our results demonstrate little polymorphism across the Arabica genotypes. Moncada and McCouch (2004) also found that polymorphism levels were significantly higher among diploid or wild tetraploid species of coffee than within cultivated Arabica coffee. The Shannon index of diversity (H') has also been reported to be greater in non-Arabica genotypes (Anthony et al. 2001, Moncada andMcCouch 2004). Higher polymorphisms in non-Arabica coffee have been attributed to their cross-breeding process, while the Arabic coffee are self-compatible. C. arabica will always present less polymorphism than other coffee species due to the evolution of its genome and domestication process (Hendre et al. 2008, Missio et al. 2010. Another parameter estimated to evaluate de suitability of the 14 microsatellites was the index of gene differentiation for multiple alleles (Gst). The average of Gst was 0.1494 with the lowest value for CaM16 (0.0416), and the highest value for M774 (0.4466). According to the classification of differentiation coefficients described by Wright (1978), only microsatellite CaM16 had a small differentiation coefficient; seven microsatellites (SSR09, M764, SSRCa88, SSR03, M32, SSR073, and M753) had moderate differentiation, and six (SSRCa18, SSR04, SSRCa87, CaM03, M24, and M774) scored high differentiation (Table 3). In addition, the Rst differentiation coefficient (Slatkin 1995) was calculated, with an average value of 0.3958. The lowest value (0.0875) was also found for CaM16, and the highest value (1.6419) was for M774 (Table 3). The values of the genetic differentiation coefficient Rst for all SSR were always greater than Gst (Table 3). The Rst coefficient treats the stepwise mutation model that is thought to reflect more accurately the mutation pattern of microsatellites, which tends to increase the Rst value (Balloux and Lugon-Moulin 2002). The 14 microsatellites analyzed for our study proved to be efficient, not only for the characterization of genetic diversity, but also for germplasm differentiation as described in the following sections.

Genetic differentiation of Coffea spp. germplasm
A clear separation of Arabica and non-Arabica genotypes was possible employing the 14 microsatellites, as observed in the PCoA ( Figure 1A). Similarly, a PCoA revealed disparate groups containing either accessions of diploid species of Coffea sp. or tetraploid C. arabica from Colombia, for which 34 microsatellites were used (Moncada and McCouch 2004). The clustering of Arabica cultivars and wild genotypes was clearer in the neighbor-joining tree, in which four groups were distinguished with most cultivars clustered in groups A and B (Fig. 1B). Wild genotypes and the commercial variety T5296 were clustered in groups C and D. Groups A and B contain genotypes that, according to World Coffee Research (2018), are classified as derived from Typica (Typica), Bourbon (Bourbon, Caturra, Villa Sarchí, Venecia, and SL-28), Bourbon-Typica (Catuaí), and as introgressed germplasm (T5175, IAPAR59). Some genotypes analyzed in this study are derived from the Timor Hybrid (Catimor,Catigua MG2,T5175,Oeiras,MG6851,IAPAR 59,Tupi RN,and T5296), which grouped interspersed in groups A and B, and the genotype T5296 in group D. Our results indicate that samples derived from the Timor Hybrid share alleles with C. arabica from wild accessions and cultivars. The allelic diversity detected over the 14 microsatellites might indicate that alleles from the original hybrid have spread through the genotypes derived from the Timor Hybrid, which showed notable diversity and interspersed grouping (Setotaw et al. 2010). The differentiation between Coffea species, and between the wild and domesticated germplasm in our study demonstrate that the selected set of microsatellites seems useful for genetic characterization of coffee germplasm.
Despite cultivars grouped according to previous reports, some accessions had the same or similar genetic profile. In group A, genotypes PDRY-14 and PDRY-22 displayed the same fingerprinting profile for the 14 microsatellites, as well as Catuaí Amarillo and Caturra. In addition, Bourbon, Oeiras MG6851, Villa Sarchí, and Venecia also had the same fingerprinting profile; they grouped near Catuaí Amarillo and Caturra because of their highly similar DNA fingerprinting. Similarly to our results, Sousa et al. (2017) found that a set of 16 microsatellites was unable to distinguish cultivars derived from Catuaí, which is included in the Bourbon-Typica group. It has also been reported that the use of 55 SNP markers did not differentiate between Caturra and Catuai (Zhou et al. 2016). These authors also found that Bourbon only differed from those of Caturra and Catuai by a single SNP, which indicated the origin of Caturra and Catuai as mutants or offspring derived from Bourbon.

Grouping of F1 intraspecific hybrids
The DNA fingerprinting protocol for coffee germplasm analysis described in this study was used to produce genetic information of F1 intraspecific hybrids that had not been previously genotyped; consequently, we also enrich the existing knowledge of coffee genetic diversity. F1 intraspecific hybrids analyzed in this study are the result of crosses between a group of wild genotypes (Sudan-Ethiopian) and commercial varieties (Bertrand et al. 2006). Hybrids included in this study were vegetatively propagated and tested in Central America, demonstrating a yield earlier than and superior to traditional cultivars, greater stability across varied environments and high beverage quality provided by an intrinsic cup quality of ET parents (Bertrand et al. 2006, Bertrand et al. 2011. The NJ tree obtained from F1 intraspecific hybrids displayed four groups ( Figure 1C). In group A, two subgroups were distinguished of which the parental lines ET-25 and Rume Sudan 1 are grouped with hybrids L04A5 (one parent is ET-25) and L13A44 (one parent is Rume Sudan). Another subgroup contained the parental line E-531 that was used to generate L13A22 and L5A26. The genotype Caturra is clustered in group B with hybrids L09A22, L11A26, and L4A34, which are derived from Caturra. Despite being derived from Catuaí and ET-41, L04A42 was grouped in B because it shares alleles from ET-41 that are also present in L11A26 and L4A34. Group C comprises hybrids L02A30 and L13A12, which derive from Ethiopian accessions (ET-15 and ET-06, respectively). Despite ET-41 (grouped in C) not being a parental line of L02A30 and L13A12, it might share alleles with ET-15 and ET-06, as previously described for the clustering of these three Ethiopian accessions (Anthony et al. 2001). Within group D, hybrids L04A20, L10A25, and L12A28 are derived from Rume Sudan, of which sample Rume Sudan 2 was also clustered. Sample T5296 was unclustered between groups A, B and C, D. T5296 is a parental line of hybrids in groups A (L04A5, L13A44), C (L13A12), and D (L12A28) that share alleles favored as an intermediate position in the NJ tree. These data constitute the first microsatellite analysis of F1 intraspecific hybrids of coffee. Coffee hybrids are an alternative for a profitable and sustainable production of coffee, and molecular tools might serve as a complementary descriptor for the identification, registration, and protection of new and improved coffee cultivars, particularly of those vegetatively propagated including hybrids.
In summary, our study describes a detailed microsatellite DNA fingerprinting protocol easily implementable, with low cost (mainly if multiplex PCR is used), and capable of classifying coffee germplasm. The scope of this protocol was demonstrated with the processing of F1 intraspecific hybrids that had never been genotyped by microsatellites before, so that through the results reported in this study we gain further insights into the genetic diversity of coffee.