Molecular characterization of parents and hybrid progenies of conilon coffee

The objective of the present work was the molecular characterization of 11 parents and 101 hybrid progenies of conilon coffee, obtained through diallel crosses from the breeding program of the Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural (Incaper, ES, Brazil). The analyses were performed with 18 Simple Sequence Repeat (SSR) molecular markers, obtaining a total of 32 alleles. SSR markers were classifi ed as moderately informative (PIC = 0.37), being effi cient in characterizing individuals. High genetic diversity was verifi ed in the 112 genotypes, based on the greater values of observed heterozygosity about to the expected heterozygosity (0.55 and 0.44, respectively), negative values for the fi xation index (F) (-0.14), and the formation of distinct groups by UPGMA. These results indicate high genetic variability among the conilon coffee genitors, which remained similar and persisting in the progenies. The average dissimilarity between parents was 0.29 and between progenies 0.34. The progenies 38 and 40 and the parent P11 were considered the most divergent in the study. The genetic variability found can be explored in the genetic breeding of the conilon coffee and guide crossings between diversifi ed and compatible genetic materials, for the composition of novel cultivars for the state of Espírito Santo.


INTRODUCTION
Coffee growing is one of the main agricultural activities of socioeconomic importance in Brazil (Ocde-Fao 2020). The state of Espírito Santo stands out in this context, where two species of economic relevance predominate (Coffea arabica L. and Coffea canephora Pierre ex Froehner) and the cultivation of coffee represents the main source of income for most local growers (Conab 2020).
C. canephora is the second most cultivated species of the genus Coffea in the world, representing approximately 40% of the total production. The state of Espírito Santo stands out as the largest Brazilian producer of this species, where it is referred to as conilon coffee (Ferrão et al. 2019a). It is characterized as an allogamous crop with self-incompatibility of gametophytic type and great natural variability for different traits (Fonseca 1996, Aguiar et al. 2005, as well as high productive potential and tolerance to drought and high temperatures (Ferrão et al. 2007).
The conilon coffee can be propagated through seeds (sexually) or cuttings (asexually), whereby the consequences of incompatibility on the productivity and genetic variability of the progenies must always be contemplated (Fonseca 1996, Ferrão et al. 2007). The genetic self-incompatibility of C. canephora is an important reproductive characteristic that favors the formation of highly heterozygous populations with high genetic variability, as it ensures the absence of self-fecundation and failure of the crossings between related genotypes (Ferrão et al. 2019a).
The propagation by seeds ensures the natural variability of the species, being considered a simple method and the main strategy to obtain highly heterozygous offspring. However, heterogeneous crops hinder the application of crop treatments, which is undesirable for coffee growers. In turn, the plantation of clonal cultivars allows reducing this non-uniformity, becoming an important propagation system, especially when individuals that are superior for the target characteristics are identified (Ferrão et al. 2007, Ivoglo et al. 2008.
Because the productive capacity of the conilon coffee depends on the compatibility of the genotypes used in the planting, the clonal cultivars should consist of a combination of genetically divergent genotypes showing great diversity of compatible alleles in order to increase the pollination efficiency and ensure a successful production (Ferrão et al. 2019b). Genetic diversity in clonal plantations is essential to maintain productivity, grain quality, in addition to favoring a greater capacity to respond to biotic and abiotic changes (Ramalho et al. 2016, Moraes et al. 2018. Studies aiming at the selection of conilon coffee clones demonstrate greater genetic variability, stability, environmental adaptations, favorable agronomic characteristics, when a combination of different genotypes is used in planting (Ramalho et al. 2016, Oliveira et al. 2018, Silva et al. 2019. The new clonal varieties of conilon coffee have been developed by grouping at least nine compatible genotypes (Ferrão et al. 2015, Ferrão et al. 2019d).
Accordingly, the genetic characterization of conilon coffee clones is essential for planning strategies for grouping progenies and composing new cultivars, with DNA markers being efficient techniques in these evaluations (Ferrão et al. 2019c). Among the different classes of molecular markers, Simple Sequence Repeats (SSRs) or microsatellites have been used in the characterization of the germplasm of several crop species (Ferrão et al. 2013, Trujillo et al. 2014, Carvalho et al. 2017, Silva et al. 2018. These markers consist of small sequences of one to six base pairs repeated in tandem (Litt & Luty 1989), are co-dominant and highly polymorphic, and have wide distribution in the genome (Turchetto-Zolet et al. 2017).
The characterization of parents and progenies via molecular markers have made it possible the assessment of genetic similarity and divergence among the accessions, helping in the efficient choice of parents for crossings. In addition, the molecular characterization has helped identify elite materials, duplicates and amplify the diversity among the available accessions, ensuring greater efficiency and speed in the evaluations, thus supporting programs to improve the coffee tree (Motta et al. 2014, Ferrão et al. 2015, Ogutu et al. 2016, Setotaw et al. 2020. The genetic breeding program of conilon coffee from the Incaper (Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural -ES, Brazil) is founded on different strategies and lines of research related to the development and recommendation of cultivars, selection of progenies, recombinations, intrapopulation improvement, phenotypic and molecular characterization, and germplasm maintenance. The strategies are established based on demand and available genetic variability. Therefore, the present work aimed to perform a molecular characterization of superior conilon coffee genitors and hybrid progenies obtained and selected from diallel crosses of Incaper's breeding program by means of microsatellite markers.

Genetic material
The genetic materials were obtained from the breeding program of C. canephora, provided by Incaper (Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural -ES, Brazil) in cooperation with Embrapa Café. A total of 101 progenies were evaluated, originated from controlled recombination in diallel scheme (complete and partial), of eleven conilon coffee clones (P1 to P11) considered promising for coffee growing.
In order to obtain the 101 hybrid progenies, initially, controlled crossings between the parents were performed in the period from 2001 to 2004 at Incaper's Experimental Farm at Marilândia (FEM -coordinates 19°24' south latitude and 40°31' west longitude), northwestern region of the state of Espírito Santo. The hybrids obtained from the crossings were initially evaluated in experimental assays along with the parents at the farms of Marilândia (FEM) and Sooretama " west longitude) (Feitosa 1986), by six harvest (years) for different agronomical traits. Based on these evaluations, carried out at the level of plant, location and crossing, 101 superior hybrid progenies were selected. As the asexual multiplication by cutting is viable in the species, each selected progeny was cloned and multiplied in a nursery at FEM in 2011.
The 101 hybrid progenies, together with the parents, were evaluated in a field experiment (in the year 2012), implemented with a randomized block design, with three replications, eight plants per plot and in the 3.0 x 1.0 m spacing. The cultivation and the conduct of the experiment followed the technical recommendations of the culture (Ferrão et al. 2012). The relationships between the parents and the progenies are shown in Tables I and II, respectively.

DNA extraction and quality
Leaf samples of each genotype (parents and progenies) were collected and lyophilized for extraction and purification of genomic DNA by the method described by Doyle & Doyle (1990), with modifications. Deproteinization of the material was accomplished in three steps with the addition of 24:1 chloroform-isoamyl alcohol (CIA). In the step of DNA precipitation, one volume of cold isopropyl alcohol and 1/3 volume of ammonium acetate (7.5 M) were used, eliminating the overnight step described in the original protocol. The quality/purity and concentrations of the total DNA were evaluated by spectrophotometry in NanoDrop equipment (Thermo Scientific 2000C). The adopted purity parameter was described by Barbosa (1998), considering the absorbance relation at 260 and 280 nm (A260 /A280) to be ideal in the interval of 1.8 to 2.0.
Subsequently, DNA samples were diluted to a final concentration of 10 ng/μL -1 .

Simple Sequence Repeat (SSR) markers screening
The magnitude and distribution of the genetic variability were estimated using microsatellite (SSR) markers. Eighteen microsatellite loci were used for the study (Table III), 15 of which derived from genomic DNA sequences and three from EST-SSRs. These primers were developed by Missio et al. (2009) and JA Silva (unpublished data), and all feature potential for amplification in C. canephora. The markers were initially tested among the parents, and those presenting polymorphic pattern were selected for genotyping of the progenies.
The amplifications were carried out in a Veriti ® thermocycler (Applied Biosystems) under the following conditions: initial denaturation at 94°C for 2 minutes, followed by ten touchdown cycles with denaturation at 94°C for 30 seconds, annealing at 66-57°C (decrements of 1°C at each cycle) during 30 seconds, and extension at 72°C for 30 seconds; 30 cycles with denaturation at 94°C for 30 seconds, annealing at 57°C for 30 seconds, and extension at 72°C for 30 seconds; and final extension at 72°C for 8 minutes.
The amplified fragments were separated in 10% polyacrylamide gel in the presence of 1X TBE buffer (0.089 M Tris, 0.089 M boric acid and 0.002 M EDTA), at 100 V for approximately 3 hours. The gels were stained by immersion in ethidium bromide solution (0.25 μg/mL) for 20 minutes and subsequently photographed under UV light in photodocumentation system (ChemiDoc MP Imaging System -Bio Rad ® ). The molecular size of the amplified fragments was estimated with molecular weight marker (ladder) of 50 base pairs (bp) (Ludwig Biotechnology).

Statistical analyses
Recording of the molecular data was performed based on the polymorphisms of the PCR products, generating a coding matrix of the individual genotypes considering the number of alleles (1, 2, 3, ... n) in each locus. In this numerical coding matrix, the homozygotes genotypes were coded as 11, 22, 33 ... nn, and the heterozygotes genotypes as 12, 13, 23, ... nn. Based on this numerical matrix, was calculated for each microsatellite locus, the parameters genetic variability, number of alleles per locus (A), observed (H o ) and expected (H e ) heterozygosity, and polymorphism information content (PIC) were calculated. The genetic distances (genetic dissimilarity) between the accession pairs were estimated from the complement of the weighted index. The clustering analysis was performed by the Unweighted Pair Group Method with Arithmetic Mean (UPGMA). The cophenetic correlation coefficient (CCC) was estimated to evaluate the consistency of the groupings and the reliability of the data. All analyses described above were performed using the program Genes (Cruz 2016).
Subsequently, the genetic dissimilarity matrix obtained with the Genes program was exported to the statistical software R (R Core

RESULTS
Among the 18 markers initially tested in the parents, 14 presented polymorphic pattern and were selected for genotyping of the progenies. A total of 32 alleles were detected, varying from two to three alleles per locus (mean of 2.29). Figure 1 represents the allele frequencies observed between the primers. The A1 allele had the highest allele frequency for most primers, except for SSRCa040, SSRCa052, SSRCa084 and ESTCOF021The primers GENCOF29, SSRM24, SSRCa084 and SSRCa095 presented the highest allele frequency (Figure 1).
The parameters Ho, He, F and PIC are described in Table IV As for the fixation index (F), all averages considering parents, progenies, and the combined data, presented negative values (-0.35, -0.13, -0.14, respectively). The values of PIC varied from 0.08 (SSRCa052 and SSRCa088) to 0.58 (SSRM24) among the parents, with mean of 0.35 (Table IV). For the progenies and joint data, the PIC values varied from 0.15 (SSRCa052) to 0.59 (SSRCa095), with mean of 0.37. In the analyzed accessions of conilon coffee, the loci can be classified as moderately informative (0.5> PIC> 0.25), with SSRM24, SSRCa084 and SSRCa095 standing out for their high informative power (PIC> 0.5) (Botstein et al. 1980).
The frequencies of dissimilarity were calculated individually for the parents and progenies, and an analysis of the joint data was performed subsequently. For the parents, the values of dissimilarity varied from 0.08 to 0.45, with a mean of 0.29. The smallest genetic  distance (0.08) was found for the parents P1 (clone 02) and P5 (clone 23). The parents P2 (clone 03) and P11 (clone 153) were the most dissimilar (0.45). Two groups (G1 and G2) were formed in the dendrogram, with cut-off established at 92.51% and CCC value of 0.77 (Figure 2). The G1 group was formed by the parents P1, P5, P6 and P11 and the G2 group by the parents P2, P3, P4, P7, P8 and P10. As expected, the most similar parents (P1 and P5) were grouped in the same group (G1), and the most divergent were separated, P2 in G2 and P11 in G1.
In the analysis of dissimilarity among the progenies, the values varied from 0.00 to 0.65, with a mean of 0.34. The smallest degree of dissimilarity was found between the progenies 8 and 9 (07 x 23 and 07 x 23), whereas the progenies 4 and 40 (02 x 24 and 02 x 83) were the most genetically distant.
Heterogeneity of the samples was evident in the clustering analysis, with distribution of the clones into 14 different groups (Figure 3). The established cut-off was 69.35%, and the CCC value was 0.56. Among the formed groups, G2 and G3 were each composed of a single accession -38 (83 x 149) and 40 (02 x 83), respectively. In the groups G4, G5, G8, G9, G10 and G11, all clones are half-siblings or full-siblings, while in the other groups the genotypes were clustered independently of ancestry.
The dissimilarity values considering the complete data set (parents and progenies) were identical to those obtained in the analysis of the progenies. The variation ranged from 0.00 to 0.65, with a mean of 0.34; the lowest degree of dissimilarity was found between the progenies 8 and 9, and the highest between 4 and 40. Thirteen groups were observed in the dendrogram ( Figure  4), with cut-off established at 67.57% and CCC of 0.56. Among the formed groups, G2, G3 and G13 each contained a single accession, respectively the progenies 38, 40 and 30.
The parents were distributed into four different groups in the dendrogram (G5, G7, G10 and G12), while the other groups were composed solely by progenies. The grouping tendency was similar to the individual analysis of the parents, where the most similar ones (P1 and P5) remained in the same group (G7) while the most distant ones (P2 and P11) were separated into the groups G12 and G5, respectively.
The genitor presenting the highest mean dissimilarity (0.39) in relation to the progenies was P11 (clone 153), and the least genetically distant ones (0.31) were P3 (clone 07), P4 (clone 11), P7 (clone 73) and P10 (clone 149), which explains their clustering with most of the accessions in the group G12. Regarding the progenies, not all clones were grouped with their respective parents. For instance, accession 54, which descends from the clones P8 x P10, was grouped in G5 together with the parents P6, P9 and P11, which also occurred with other clones.

DISCUSSION
Molecular markers, among which the SSR, have been used as alternative and complementary techniques to classical genetics in the characterization of the plant germplasm. The advancement of these techniques has overcome the limitations encountered in the assessments based only agromorphological parameters, thus ensuring greater efficiency and rapidness in the available evaluations of genetic variability (Caixeta et al. 2015). In the coffee tree, a perennial crop in which most of the characteristics that are of economic interest are polygenic and may take many years to be expressed, thus the of molecular markers may accelerate the evaluation and selection processes (Ferrão et al. 2019c).
In this study, the variability within the gene pool of elite conilon coffee parents and their hybrid progenies clearly demonstrated the efficiency of SSR markers in quantifying and detecting high genetic variability in the analyzed accessions. Moreover, the PIC allowed characterizing the markers as overall moderately informative, confirming that they were efficient in detecting variability among the accessions. Other authors have also confirmed the efficiency of the SSR marker in their analysis. Ferrão et al. (2013) evaluated the efficiency of three classes of markers (RAPD, AFLP and SSR) in the analysis of genetic diversity of conilon coffee and, concluded that the SSR marker was the most informative, due to its codominant nature and high reproducibility.
Similar numbers of alleles per locus and high genetic variation among genotypes of conilon coffee were also found by other authors. Studies with SSR markers in C. canephora have reported an average number of alleles per locus ranging between 4.84 (Ferrão et al. 2013), 2.4 (Motta et al. 2014), 4.06 (Syafaruddin et al. 2017. The high variation found may be attributed to the allogamous nature and the genetic selfincompatibility of the species. The self-incompatibility ensures natural variability of the species, promoting the formation of highly heterozygous populations, with high genetic variability for different characteristics (Ferrão et al. 2019b). Hence, it was expected high variation between genotypes and that that the values of H o would be higher than those of H e , confirming the high degree of heterozygosity and genetic diversity among the accessions. Another parameter that confirms this high diversity is the fixation index (F), which had negative values for most of the loci. Negative values for the fixation index are expected for allogamous species and are common when H o is greater than H e , indicating excess of heterozygotes (Wright 1965).
The results for the genetic variability parameters agree with other studies carried out in C. canephora with SSR markers. Hendre & Aggarwal (2014) found H e values between 0.13 to 0.85 and H o values between 0.00 to 1.00. Syafaruddin et al. (2017) observed H e values ranging from 0.10 to 0.80 and H o values from 0.11 to 0.94. The results confirm the high genetic variability found between accessions of the species.
Heterozygosity as a parameter of genetic variability indicates that the high variability among the conilon coffee parents is similar and prevails in the progenies, implying possibilities of greater genetic gains. The high variability among the parents was expected, since they are elite materials from Incaper's breeding program, standing out as promising clones for coffee production in Espírito Santo.
The variability found in the parents and maintained in the progenies, indicate that in genetic improvement works the variability observed between the parents can be used in crossings aiming at greater variability in the segregating generations. According to Borém &Miranda (2013) andFerrão et al. (2019c), crossings involving genetically different parents may produce a high heterotic effect and the breeder may obtain hybrids, by selection and cloning, with variability similar or even superior to those achieved by the best clones. Thus, the results allow to identify the parents that can be used in targeted crosses, standing out the parents P2 (clone 03) and P11 (clone 153) which were the most dissimilar.
The genetic variability present in the parents and progenies and the heterogeneity evidenced by the formation of different groups among the progenies (Figure 3), can be explored for the definition of future improvement strategies Knowledge of the diversity between the genotypes can guide the hybridizations only between the most divergent accessions and, select materials to compose a novel clonal hybrid cultivar with broad genetic basis. Moreover, the generated information may help in the maintenance of the Active Germplasm Bank (AGB) and the exchange of accessions between institutions, avoiding works with genetically similar materials.
Characterizations by molecular markers performed by Souza et al. (2013), Solórzano et al. (2017 and Huded et al. (2020) also revealed wide genetic diversity in C. canephora and reinforce the importance of this information for the hybridization between germplasms and for the composition of clonal varieties The results of this study also made it possible to identify duplicate accesses in the set of samples evaluated. The accessions 8 and 9 are progenies from the same parents (07 x 23) and, as they presented 100% similarity, can be considered duplicates. Such information is important because the identification of duplicate accessions by molecular characterization represents a strategy to reduce the time and financial resources needed for the germplasm conservation (Valls 2007). Several studies with molecular markers carried out in coffee trees (Solórzano et al. 2017, Baltazar & Fabella 2020 and in several other species (Gasi et al. 2016, Sochor et al. 2019 have identified duplicate accessions in germplasm banks. In the individual analysis of the progenies, samples 38 (83 x 149) and 40 (02 x 83) formed isolated groups (G2 and G3). The comparison of these materials with all the other progenies revealed their greater mean dissimilarity (0.49 and 0.41, respectively), demonstrating the high genetic divergence between these accessions and the remaining genotypes of this study, justifying the formation of isolated groups. These progenies are half-siblings, sharing their ancestry with the parent P8 (clone 83) and are promising for future genetic improvement work. Ferrão et al. (2017) evaluated the genetic diversity of the same progenies of this study by means of 13 agronomical traits. The authors verified that the progenies 9, 28,29,30,46,59,63 and 67 were the most dissimilar for the studied characteristics. Relating the results of the cited authors to those obtained in this study, it is observed that these genotypes were also dissimilar here, at the DNA level, being separated into different groups in the dendrogram ( Figure  3). The results indicate that these materials have molecular and phenotypic divergence.
In the joint analysis of the 112 genetic materials studied in the species (parents and progenies), those considered the most divergent were the progenies 38 (83 x 149) and 40 (02 x 83) and the parent P11 (clone 153). The progenies 38 and 40 remained in isolated groups ( Figure  4), confirming the high divergence in these accessions. The formation of groups composed of a single accession indicates that these are the most divergent in relation to the others (Vieira et al. 2009), which can be explored to potentiate the hybrid performance of the germplasm in future breeding works. Moreover, this information represents an important aid in studies to identify desirable alleles and characteristics of interest for the coffee crop.
Not all progenies were grouped with their respective parents, which is attributed to the proximity between some of them. For instance, the accession 54 (P8 x P10) was clustered with other parents (P6, P9 and P11), which was also observed for other progenies. The groups are formed in the dendrogram based on the estimates of dissimilarity, such that the genotypes exhibiting similarities tend to remain in the same group (Cruz et al. 2011). The clustering of progenies with different genitors may arise from a relative proximity between these parents. This could be observed for the accession 54, where the parents P8 and P10 possess a mean dissimilarity of 0.29 and 0.26, respectively, in relation to the parents P6, P9 and P11, explaining the disposition in the dendrogram.
The obtained results allowed the verification of genetic divergence between the parents and the progenies, as well as high variability that can be explored in the genetic improvement of the conilon coffee. It stands out that the parents P2 (clone 03) and P11 (clone 153) and progenies 4 (02 x 24), 38 (83 x 149) and 40 (02 x 83) were the most divergent in the study. This information can be used in directed crossings between contrasting accessions and ensure the adequate maintenance of the germplasm, contributing to prevent possible genetic losses during the multiplication of the genetic material. It is worth noting that the most similar accessions, contemplating the effect of selfincompatibility and as long as they also present adequate agronomical performance, can be used in transgressive segregation with the aim of obtaining progenies with phenotypes that are superior to both parents.

CONCLUSIONS
The SSR markers, classified as moderately informative, were efficient in detecting genetic variability between the genotypes of conilon coffee. The parameters of genetic variability analyzed indicated high genetic diversity among the parents, which remains in the progenies, with emphasis on the parents P2 (clone 03) and P11 (clone 153) and the progenies 4 (02 x 24), 38 (83 x 149) and 40 (02 x 83), that were considered the most divergent in the study. Through the performed molecular characterization, it is possible to select the most dissimilar materials to create, together with superior agronomical and quality traits, possible cultivar compositions for the state of Espírito Santo. (Incaper, Brazil)