Characterization and genetic diversity of Coffea canephora accessions in a germplasm bank in Espírito Santo, Brazil

The state of Espírito Santo is the major producer of Coffea canephora in Brazil. Knowledge of genetic reserves is fundamental to plant breeding. Therefore, the present study characterized and analyzed the genetic diversity of 600 C. canephora accessions from the germplasm bank of Incaper based on 38 traits evaluated in 24-30-month-old plants. Further, the predominant descriptors or traits were identified, and high phenotypic variability was determined. Genetic distances for the grouped (Gower), quantitative, and qualitative datasets were 0.48, 0.61, and 0.92, respectively, with accessions 76 (Conilon) and 407 (Robusta) being the most divergent ones at Incaper. In clustering using the Tocher optimization method, 30 groups were formed, with three accessions introduced from Epamig’s Robusta collection being the most dissimilar ones. Graphical dispersion analysis using the principal coordinate method revealed the predominance of three groups formed by the Robusta, Conilon, and hybrid Robusta × Conilon genotypes.


INTRODUCTION
Coffee cultivation, as an important activity of the agricultural sector, is highly relevant in the context of the socioeconomic development of Brazil. In particular, in the state of Espírito Santo, coffee production has become a major component of the agribusiness sector, occupying an area of approximately 460.000 hectares (CONAB 2020). Coffea canephora is the second most cultivated species in the world, representing 40% of overall coffee production, and Espírito Santo stands out as the largest Brazilian producer of this species, designated in the State as Conilon coffee (Ferrão et al. 2019).
Coffee (Eudicotyledoneae) belongs to the genus Coffea of the angiosperm family Rubiaceae, with 124 species cataloged in the literature to date (Davis et al. 2011). Among these, only C. arabica and C. canephora are economically importance, while the others are of fundamental relevance in genetic breeding programs for hybridization and gene transfer (Ferrão et al. 2019, Mistro et al. 2019. MAG Ferrão et al. C. canephora is a diploid, perennial, allogamous species, with easy vegetative propagation and self-incompatibility of the gametophytic type (Conagin and Mendes 1961, Ferrão et al. 2017c, Moraes et al. 2018, and it is known to have denominations, such as Kouillou (Conilon) and Robusta (Charrier and Berthaud 1988). At the beginning of its expansion in the world, the species was known as Robusta. Based on records, it was introduced to Espírito Santo in Brazil in 1912, with the receipt of being the Kouillou variety, which was called Conilon. It became significantly widespread after the 70s, with the mother plants selected by the farmers themselves or introduced over the years, enabling the establishment of populations with wide genetic variability (Ferrão et al. 2019).
In 1985, the Institute of Research, Technical Assistance and Rural Extension of the State of Espírito Santo (Incaper), initiated a research program to explore the great variability of C. canephora through diverse selection and breeding strategies. In the 35 years of research, nine clonal and two seminal cultivars have been developed and made available to producers in Espírito Santo. These cultivars are the basis of the existing crops in the state. Incaper also maintains a germplasm bank of C. canephora in the field, with a large number of accessions, including cultivars, progenies, hybrids, and clones. Analysis of genetic variability among accessions, together with the maintenance and adequate characterization of germplasm, is vital to exploit the genetic diversity and trait reserve of the present collection for developing new genotypes through selection and recombination strategies (Ferrão et al. 2019). Characterization involves the evaluation of phenotypic descriptors defined a priori for the species, together with the assessments of biotic and abiotic factors, quality, and productive potential, among other variables (Ferrão et al. 2017a).
To this end, the present study characterized 600 C. canephora accessions from the germplasm bank of Incaper, maintained at three locations in Espírito Santo, for different agronomic traits based on specific descriptors, combined with the analysis of genetic variability and divergence.

Genetic materials and evaluation sites
Incaper has maintained a C. canephora germplasm bank at the experimental farm in Marilândia, Espírito Santo, Brazil, since 1988. In 2016, actions to renew, duplicate, and expand this collection were initiated. Between January and May 2017, the collection was planted in the field in three coffee growing regions of Espírito Santo, including the experimental farms in Marilândia (FEM), Sooretama (FES), and Bananal do Norte (FEBN), located in the northwest, northeast, and south of the state, respectively. In the field, 3-5 plants of each of the 600 accessions were planted at a spacing of 3.0 m × 1.20 m in a single plot. Management and fertilization followed the crop recommendations by Incaper, and included various activities such as cycle pruning and irrigation for the maintenance of the initial establishment stand; no pest and disease control was performed (Ferrão et al. 2017b). The accessions are selected progenies from crops in different municipalities of Espírito Santo and southern Bahia, progenies from controlled crosses at Incaper and genotypes introduced from the Robusta collections.

Assessments, characterization, and data collection
From May to December 2019, 31 qualitative and 7 quantitative traits were evaluated at each location using 24-30-month-old plants. The qualitative traits were assessed based on coffee descriptors (Fazuoli et al. 1994, IPGRI 1996, Brasil 2000, evaluating variables related to the plant, branches, leaves, cycle, fruits, and seeds as well as those related to response to pests, diseases, and drought, using categorical scales (described in Tables 1 and 2). The following quantitative traits were derived from evaluations at harvest and post-harvest: grain yield at the first harvest (60 kg bag processed per hectare); percentage of floats; ratio of cherry coffee weight to processed coffee weight and that of natural coffee weight (dry at 11% moisture) to processed coffee weight (2 kg sample of cherry coffee obtained at the harvest); and percentage of average flat grains with sieves above 13, mocha grains with sieves above 10, and small grains (residue) (flat below 13 and mocha below 10).

Data analysis
The present data represent the mean of values from the three locations explored. For traits 1 to 26, which originated from multi-categorical evaluations based on categorical scales, the consistency of data from the three sites was analyzed. When divergence between data was observed, new data were collected and the results were revalidated.
Descriptive analysis was performed based on the data matrix of 600 accessions, which showed uniform development and 38 traits (qualitative and quantitative). The frequency of genotypes classified in each descriptor and their predominance in the germplasm were analyzed for each trait. Genetic divergence for 562 accessions was analyzed based on the complete set of qualitative and quantitative traits evaluated in the same phase using multivariate statistical procedures (Regazzi and Cruz 2020). The qualitative data (multi-categorical) were analyzed using the simple coincidence index method, and the quantitative data were analyzed using the standardized Euclidean mean distance. The matrices of the two datasets were grouped using the mean Gower method. In each dataset, the cophenetic correlations and genetic distances between the more and less similar genotypes were estimated. The genotypes were grouped with the joint Gower matrix using the modified Tocher optimization method, unweighted pair group method using arithmetic averages (UPGMA), and graphical dispersion analysis using the principal coordinate method. Statistical analyses were performed using GENES (Cruz 2016) and R (R Core Team 2019).

Characterization of genotypes based on descriptors and complementary traits
Based on the data obtained through the evaluation of genotypes at the three sites, the average value of each trait was defined for each genotype, and descriptive analysis was performed to determine the predominance of each descriptor in Incaper's germplasm bank. Table 1 presents the average frequencies (in percentage) of 26 phenotypic descriptors in the studied accessions: plants (n = 3), branches (n = 2), leaves (n = 8), fruits (n = 5), seeds (n = 6), and cycle (n = 2). Table 2 presents the average frequencies of 12 complementary traits in the studied accessions: response to diseases (n = 2), pests (n = 2), and drought (n = 1) and quantitative traits related to production (n = 1) and after harvest (n = 6). A summary of description of each trait, together with the discussion on the most relevant aspects, is presented below.

Plants:
Regarding the shape of the plants, almost all accessions (96.99%) were classified as cylindrical-conic, with no variation in this characteristic among the three sites. Among the Robusta accessions, only two were classified as cylindrical. Regarding plant height, majority of the genotypes showed medium height (52.25%), followed by high (21.70%), low (13.52%), very high (9.36%), and very low (3.17%). Of the accessions classified as low at all evaluated sites, three clones were identified as Robusta, which generally has larger plants. Regarding the diameter of the plant, many accessions had a large diameter (60.77%), followed by very large (21.70%) and medium (15.03%). Over 82% of the accessions had a large or very large canopy diameter, and over 40% of such accessions were identified as Robusta. Regarding the internode length of the stem, 66.44% accessions were classified as medium, 29.22% as long, and 4.34% as short.
Leaves: Over 90% of the accessions had leaves classified as medium or long in terms of length and medium or wide in terms of width. In general, C. canephora plants bear three types of leaves: elliptical, oval, and lanceolate. Across the three sites studies, 60.93% leaves were lanceolate, 35.06% were elliptical, and 4.01% were oval, and there were variations in leaf shape among the three sites. Leaf color in the young phase was bronze-green (57.43%), bronze (34.89%), or green (7.68%), whereas that in the adult phase was predominantly dark green (98.66%). There were no purple leaves. Leaf edge intensity and depth of the secondary vein were classified as medium in over 70% accessions.

Seeds:
The seed length was medium, long, and short for 52.37%, 27.29%, and 20.34% accessions, respectively. The seed width was classified as medium, wide, and narrow for 60.17%, 27.29%, and 12.54% accessions, respectively. The seed thickness was classified as medium for 63.73% accessions. The endosperm color was green in 60.88% accessions and yellow in the rest (39.12%). The silver skin shade was dark and light in 58.35% and 41.65% accessions, respectively. The degree of adhesion of the silver skin was weak, medium, and strong in 39.12%, 52.28%, and 8.60% accessions, respectively.
Cycle: The ripening cycle, evaluated at the first harvest when over 80% fruits were ripe, showed early (46.95%) and medium (32.03%) maturation in most accessions, followed by late (15.93%), very early (4.41%), and very late (0.68%). Overall, fruit maturation in all accessions was earlier in FES, located in the northeastern region of Espírito Santo. The genotypes that matured before April 15, from April 15 to May 31, from June 1 to June 30, and from July 1 to July 31 were classified as very early, early, medium, late, and very late cycle, respectively.

Responses to diseases, pests, and drought:
The responses of genotypes to diseases and pests were assessed on a scale from 1 to 9, where 1 indicates resistant genotypes and 9 indicates highly susceptible genotypes, and the average of the highest score at the three sites was calculated. Drought response was assessed on a scale from 1 to 3, where 1 indicates susceptible genotypes and 3 indicates tolerant genotypes. Disease incidence was higher at FES located in the northeastern region. Specifically, 37.89% genotypes were resistant (scores 1 to 3), 14.53% were moderately susceptible, and 47.58% were susceptible (scores 7 and 9) to Hemileia vastatrix infection. Majority of the accessions were moderately  MAG Ferrão et al. susceptibility to Cercospora coffeicola infection, with 71.29% accessions achieving a score of 5. Regarding pests, 64.44% accessions were moderately susceptible to coffee leaf miners (score 5), but majority were moderately resistant (57.34%) and some were resistant (24.28%) to rosette cochineals. Regarding drought tolerance, only 8.21% accessions were highly tolerant (score = 3), while the rest showed moderate (56.11%) to low (35.68%) tolerance. Disease and pest prevalence was high (at least at some locations) and was helpful in characterizing the germplasm in terms of resistance and susceptibility.

Quantitative traits (complementary):
Quantitative traits were presented as the average of values at the three sites, representing the data of the first harvest. On average, 68.56% accessions showed productivity below 35 bags ha -1 and only 0.54% showed productivity exceeding 70 bags ha -1 . The proportion of floaters varied from 0% to 100% in FEM and FES and from 0% to 60% in FEBN. The average proportion of floaters across the three sites was between 21% and 40%. Regarding post-harvest coffee yield, in majority of the accessions, the ratio of cherry coffee weight to processed coffee weight ranged from 4.1 to 5.0 (53.37%) and that of dry natural to processed coffee weight ranged from 1.5 to 2.0 (54.71%). For the classification of grains, the sum of flat grains retained in sieves 13, 15, and 17; the sum of mocha grains retained in sieves 10, 11, and 12; and the residue (grains retained at the bottom of these sieves) were considered. The average proportions of flat grains, mocha grains, and residue were 57%, 27%, and 17.19%, respectively. Given that a high proportion of flat grains is a desirable trait to select genetic material for plant breeding, most genotypes studied here appear suitable as genetic material, with the proportion of flat grains exceeding 50% and that of mocha grains being below 30%.
Overall, the frequency distributions exhibited a wide variability. Akpertey et al. (2019) characterized Robusta species based on morphology and found that few genotypes differed in terms of the most prevalent traits.

Genetic divergence of accessions in Incaper's germplasm bank
Genetic divergence was analyzed using multivariate statistics with the data of 562 accessions evaluated for 38 traits (qualitative and quantitative). The pairs of genotypes with greater similarity and dissimilarity differed among the three data matrices (qualitative, quantitative, and grouped). Nonetheless, the magnitude of cophenetic correlation coefficients was comparable (0.614-0.652) ( Table 3). The greatest genetic distance was observed in the qualitative data matrix (0.920), followed by the quantitative data (0.613) and grouped data (0.472) matrices. The shortest genetic distance was observed in the quantitative data matrix (0.027), between accessions 191 and 368, which were selected in different years and from different locations. Accessions 76 and 407 were identified as the most divergent in this study, with a distance of 0.920. These accessions were also phenotypically distinct at the field level, exhibiting typical traits of Conilon (76) and Robusta (407). This result also demonstrates that this pair of accessions presents the greatest genetic distance ever observed in the studies based on phenotypic and/or molecular data from Incaper's reasearch program (Fonseca et al. 2006, Ferrão et al. 2009, Ferrão et al. 2017d, Giles et al. 2018, Senra et al. 2020. Genotype 76 exhibits superior agronomic characteristics and has been used as material in clonal cultivars and seeds developed and released by Incaper. Clustering of materials using the modified Tocher optimization method formed 30 groups (Table 4). The first group included the highest number of accessions (191) with the greatest distance (0.187). The most similar accessions were 117 (87/89) and 182 (102/89), representing genotypes selected in the same year (1989) from farms of producers in the northern region of Espírito Santo through phenotypic mass selection. In contrast, the most dissimilar accessions were 507, 470, and 463, representing Robusta genotypes introduced from of the Agricultural Research Company of Minas Gerais (Epamig), Minas Gerais, Brazil.
The results of grouping using UPGMA were consistent with those of grouping using the modified Tocher optimization Table 3. Cophenetic correlations and minimum and maximum genetic distances of 562 Coffea canephora accessions in Incaper's germplasm bank collection evaluated based on the qualitative, quantitative, and grouped data matrices using the simple coincidence index, standardized Euclidean mean distance, and Gower method, respectively

Data matrix
Cophenetic correlation

Maximum distance
Accessions with the maximum distance method, but illustrative graphical representation was difficult due to the large number of genotypes. However, in dispersion analysis, the genotypes were distributed in the four quadrants of the graph (Figure 1) based on the principal coordinate method. The accessions classified as the Robusta-type were concentrated in the quadrants on the left and were represented by numbers 392-461 (introduced from IAC), 462-532 (introduced from Epamig), and 554-562 (selected by Incaper). The accession identified as the Conilon-type or Robusta × Conilon hybrids were concentrated on the right and in the middle of the graph, respectively. The component clones of Incaper's clonal cultivars were spatially well distributed in the middle and on the left, representing the variability among them.
The accessions in Incaper's germplasm collection are numbered according to the order of their inclusion from 1986 to 2016. Many accessions are obtained via phenotypic selection from farms in Espírito Santo and southern Bahia performed by Incaper researchers, with the purpose of sourcing the natural genetic variability present in crops resulting from natural crossing over the years. The results showed random grouping of many accessions, regardless of the year and place of selection, characterizing the well distributed variability of the species in the region.
These data and knowledge related to the genetic structure indicate that the Brazilian germplasm of C. canephora represents only a small portion of the total diversity of the species (Ferrão et al. 2019). Thus, additional efforts to collect and introduce accessions are warranted to expand the genetic basis as well as to maintain and characterize germplasm collections for exploring the genetic potential of the species in the light of constant demand for improved cultivars adapted to environmental and technological changes. Importantly, Incaper's clonal cultivars of Conilon coffee comprise a group of diverse genotypes with superior agronomic characteristics. Thus, to maintain the genetic basis of this species in the state of Espírito Santo, the producers must use the set of clones of each clonal cultivar and, whenever possible, multiple cultivars (clonal, seminal, or both) for plantation within their rural properties. Figure 1. Graphical dispersion of 562 Coffea canephora accessions in Incaper's germplasm bank assessed using principal coordinates analysis. Robusta and Conilon accessions are shown in blue and pink, respectively. Component clones of cultivars are identified with numbers.