Estimation of genetic diversity in full-sib families of elephant grass Cenchrus purpureus (Schumach.) Morrone

ABSTRACT One of the challenges of the energy sector is the identification of renewable resources with less impact on the environment and that are economically viable. This study aimed to estimate the genetic diversity of eleven full-sib families of elephant grass, using quantitative traits associated with bioenergy production. The trial followed a randomized blocks design, with four replications and each plot (family) consisting of five plants, totaling 220 genotypes. Five quantitative traits were measured: dry matter yield, dry matter percentage, plant height, stem diameter and number of tillers. The genetic diversity was estimated using multivariate methods (principal component analysis and hierarchical clustering). The analysis revealed a significant genetic diversity among the full-sib of elephant grass, with a greater variability observed for dry matter yield and number of tillers. The families 1, 2, 5, 7 and 8 exhibited superior genotypes for bioenergy production-related traits. The grouping analysis generated twenty clusters, enabling the differentiation of genotypes. Eight clusters comprised genotypes with a high dry matter yield and plant height, simultaneously, surpassing the overall mean for number of tillers.


INTRODUCTION
In recent decades, the global challenge faced by major world powers has been to achieve economic growth while ensuring sustainability (Nong et al. 2020, Topcu et al. 2020).As economic growth is accompanied by environmental concerns,
One of the challenges of the energy sector is the identification of renewable resources with less impact on the environment and that are economically viable.This study aimed to estimate the genetic diversity of eleven full-sib families of elephant grass, using quantitative traits associated with bioenergy production.The trial followed a randomized blocks design, with four replications and each plot (family) consisting of five plants, totaling 220 genotypes.Five quantitative traits were measured: dry matter yield, dry matter percentage, plant height, stem diameter and number of tillers.The genetic diversity was estimated using multivariate methods (principal component analysis and hierarchical clustering).The analysis revealed a significant genetic diversity among the full-sib of elephant grass, with a greater variability observed for dry matter yield and number of tillers.The families 1, 2, 5, 7 and 8 exhibited superior genotypes for bioenergy production-related traits.The grouping analysis generated twenty clusters, enabling the differentiation of genotypes.Eight clusters comprised genotypes with a high dry matter yield and plant height, simultaneously, surpassing the overall mean for number of tillers.
In this context, the use of biomass presents an alternative to increase clean energy in the national and global energy mix.Given the expanding possibilities of biomass use across various industrial sectors and the high demand for biomass, many institutions have focused their research on crops with a high biomass yield index (Marafon et al. 2012).Among tropical species, elephant grass [Cenchrus purpureus (Schumach.)Morrone] stands out due to its characteristics that contribute to biomass quality, including a greater capacity for dry matter accumulation, high fiber content, high C/N ratio and high calorific value (Daher et al. 2014, Menezes et al. 2015, Rocha et al. 2015).
The elephant grass breeding program for bioenergy purposes developed by the Universidade Estadual do Norte Fluminense Darcy Ribeiro has achieved promising results (Lima et al. 2011, Daher et al. 2014, Menezes et al. 2014, Rossi et al. 2014, Santos et al. 2014, Rocha et al. 2015, Sousa et al. 2016, Silva et al. 2020).Characterizing the germplasm in a breeding program is essential, as it provides a foundation for the better utilization of available genetic resources (Nass & Paterniani 2000, Vieira et al. 2007).Therefore, studying traits of interest becomes necessary to assist in the selection of superior genotypes (Cavalcante & Lira 2010).
In this regard, multivariate analysis techniques serve as important tools for genotype selection based on their relevant traits.The principal component analysis (PCA), among various multivariate techniques, is a powerful and robust integrated technique that reduces the dimensionality of the set of traits of interest, generating orthogonal axes that are linear combinations of the original variables called principal components (Leite et al. 2016, Maia et al. 2016).The components obtained in the analysis exhibit a correlation profile among the traits, indicating which traits contribute the most to the diversity of the population under study.Consequently, this technique facilitates the selection of superior genotypes in elephant grass cultivation by practically identifying genotypes with optimal performance for the evaluated traits of interest.
The present study aimed to estimate the genetic diversity of full-sib progenies of elephant grass using principal components and hierarchical clustering based on quantitative traits associated with energy biomass production.

MATERIAL AND METHODS
Twenty elephant grass accessions obtained from the germplasm bank of the Universidade Estadual do Norte Fluminense Darcy Ribeiro were selected as parents to establish full-sib families.The selection of these parents was based on previous studies evaluating their biomass production potential (Rocha et al. 2015) and genetic diversity (Lima et al. 2011).
Manual crosses were performed by collecting pollen grains from the selected elephant grass genotypes (male parents) in paper bags.These were then placed on the inflorescences of the female parent genotypes, which were protected with paper bags until their receptive stigmas were visible.The crosses were carried out between 8 a.m. and 10 a.m.
Eleven full-sib families were obtained through the manual crosses among parents (Table 1), and the seeds from each family were harvested separately.On October 6, 2020, the seeds were sown in styrofoam trays with 128 cells filled with forest substrate.The seedlings were transplanted to the field when they reached a height of 20 cm, approximately at 40 days after germination.
On November 18, 2020, the experiment was established at the Empresa de Pesquisa Agropecuária do Estado do Rio de Janeiro (Pesagro-Rio) station (21º19'23''S, 41º19'40''W and average altitude of 25 m), in Campos dos Goytacazes, northern region of the Rio de Janeiro state, Brazil.
The region has a tropical hot and humid climate, classified as Aw according to Köppen, with a dry period during the winter and a rainy period in the summer, with average annual rainfall of 1,053 mm.The fertilization practices followed Almeida et al. (1988), with 100 kg ha -1 of P 2 O 5 (single superphosphate) applied at planting.After two months of growth, topdressing was conducted using 25 kg ha -1 of N (ammonium sulfate) and 25 kg ha -1 of K 2 O (potassium chloride).
The experimental design was randomized blocks, with four replications and each block consisting of 11 families of full sibs.Within each plot, five plants were spaced 1 m apart between and within the rows.An evaluation cut was performed at 10 months of growth.
Quantitative traits were measured in the fullsib families during the 10-month field evaluation, as it follows: plant height (m): individual measurement of the height of each plant (clump) within the plot using a graduated ruler; stem diameter (cm): average diameter measurement taken approximately at 10 cm from the ground of three stems per plant within the plot.A digital caliper was used for measurement; number of tillers per linear meter: count of the number of tillers for each plant within the plot; dry matter yield: estimated by weighing the green matter of whole plants (kg), using a suspended digital scale within a defined area (1 m²).The dry matter percentage was determined from sampled plants and used to convert the weight to t ha -1 year -1 .
Additionally, the tillers from the five individual plants in the experimental area were weighed.Two tillers were randomly sampled, ground using a grinder and packed in paper bags.The materials (leaves and stems) from each plot were weighed, labeled and dried in a forced-air oven at 65 ºC, for 72 h.The samples were re-weighed to determine the air-dried weight (Silva & Queiroz 2002).The dry material (leaf and stem) was ground using a Wiley mill with a 1 mm sieve and packed in plastic bags for the determination of the oven-dried weight, which was determined with 2 g of each ground material being placed in an oven at 105 ºC, for 12 h, and weighed again.These measurements allowed the calculation of the following variables: dry matter: percentage obtained by multiplying the values of the air-dried sample and the oven-dried sample; dry matter yield: estimated by multiplying the dry matter percentage and the tiller weight of each individual plant within the plot.The results were converted to t ha -1 .
Descriptive statistics were employed for the five quantitative traits related to biomass production for energy generation, evaluated in 220 genotypes from 11 families of elephant grass full-sibs.The calculated parameters included the minimum value, maximum value, standard deviation and coefficient of variation.
Principal component analysis (PCA) and hierarchical clustering were performed using the average values of the five quantitative traits related to biomass production for energy generation.Prior to the PCA, the data were subjected to Bartlett's sphericity tests to assess the homogeneity of variances.Subsequently, the Kaiser-Meyer-Olkin index was calculated to determine the proportion of data variance common to all variables.
For the cluster analysis, a dissimilarity matrix was obtained based on the Mahalanobis' generalized distance (D ii' ), calculated using the equation: D ii' = (δ' φ -1 δ) 0.5 , where φ -1 is the sample covariance matrix, δ the sample matrix and δ' = [d 1 d 2 ...d n ], in which d j = X ij -X i'j , where X ij is the mean of the variable i in genotype j and X i'j the arithmetic mean of the variable i in the sample.
Using the distance matrix, the unweighted pairgroup method using arithmetic means (UPGMA) was applied for hierarchical clustering and construction of the dendrogram.The optimal number of clusters was determined using the Mojena's method, with a stopping rule of k = 1.25, based on the relative size of fusion levels in the dendrogram (Milligan & Cooper 1985).
All the diversity estimation analyses were performed using the R software (R Core Team 2021).Descriptive statistics and boxplots were generated using the pastecs and ggplot2 packages, respectively.The principal component analysis used the psych, FactoMineR and factoextra packages.The hierarchical clustering analysis employed the hclust native function and the circlize package.

RESULTS AND DISCUSSION
To estimate the genetic diversity of elephant grass full-sibs, descriptive statistics, principal component and hierarchical clustering analyses were performed.The descriptive analysis showed that the traits with the highest coefficients of variation (CV%) were number of tillers (31.72 %) and dry matter yield (31.09 %), followed by stem diameter (14.29 %).The CV% values for dry matter percentage (11.06%) and plant height (11.45 %) were considered moderate (Table 2).
The high CV% values and standard deviations obtained for number of tillers and dry matter yield indicate a wide variability, as observed in the boxplots, which were distributed among and within the elephant grass full-sib families (Figure 1).The coefficient of variation parameter is an efficient indicator for distinguishing genotypes, as it is not related to any measurement unit, making it efficient in comparing traits, in terms of variability (Khadivi-Khub et al. 2016).The observed variability arises from the interaction between genetic and nongenetic factors, such as temperature, photoperiod, rainfall distribution, soil type and fertility, and crop management.These factors contribute to the variability of these traits, as quantitative traits are polygenic and strongly influenced by the environment.
The principal component analysis generated three significant components (Table 3).These orthogonal components are obtained by linearly combining the original variables to discriminate and maximize the understanding of the existing correlation structure among the evaluated quantitative traits (Abdi & Willians 2010, Silva & Sbrissia 2010).The first two components explained 81 % of the accumulated variability.According to Yang et al. (2009), a graphical biplot analysis should explain at least 60 % of the data variation with the first two principal components.The significance of the components was determined using the latent root criterion (Kaiser 1958, Hair et al. 2009), which retains components with eigenvalues > 1.The factor loadings indicate the correlation between the variable and the component, with values above 0.57 considered the most relevant for representation.2020) emphasized that the importance of a principal component is determined by the total variance explained by that component.Elephant grass has promising advantages, if compared to other biomass sources such as eucalyptus.The calorific value is an indicator of the energy present in a biomass source.Studies comparing these species have shown that elephant grass can reach a calorific value of 4,440 kcal kg -1 , depending on the variety, while eucalyptus can reach 4,601 kcal kg -1 , depending on age.However, elephant grass exhibits higher energy gain than eucalyptus, considering the higher cost of obtaining biomass from forest species.Elephant grass is perennial, can be cultivated in different regions and small properties, requires low financial resources, is easy to manage, allows mechanization and can be harvested in up to 90 days.Furthermore, while the average annual dry matter yield from eucalyptus can reach 20 t ha -1 year -1 under the best conditions, elephant grass produces, on average, up to 60 t ha -1 year -1 (Zanetti et al. 2010, Marafon et al. 2016).Table 3 shows the traits that contributed the most to the discrimination of genotypes, as indicated by the highest eigenvalues for the respective components.Dry matter yield was the most important trait for genotype discrimination (0.92), followed by plant height (0.66) in the first principal component, which explained 35.11 % of the total variation.Al- e-ISSN 1983-4063 -www.agro.ufg.br/pat-Pesq.Agropec.Trop., Goiânia, v. 53, e75967, 2023 Dry matter yield is the target trait in the elephant grass breeding programs for the selection of superior genotypes, as it represents the final product in the biomass production cycle for the practical use of biomass.Thus, the first component becomes a practical tool for distinguishing genotypes with the highest performances, in terms of dry matter yield, as indicated by its factor loading.
The principal component 2 explained 26.80 % of the variance, with number of tillers (0.76) and stem diameter (0.57) being the prominent traits.For the principal component 3, only dry matter percentage (0.76) contributed to the component variance.Therefore, the principal component analysis reduced the dimensionality of the data measured by the interrelated traits, enabling a more practical analysis of the existing diversity between genotypes and facilitating the selection of genotypes based on the main traits of interest (Maia et al. 2016).
The evaluation of eigenvalues (Figure 2A) is the first step in the principal component analysis, explaining the total variance displayed in the component.The variance of the set of genotypes is represented in the five principal components obtained (Figure 2B).From the third component onwards, the contribution of the vector in the distinction of the genotypes of the elephant grass families decreased, and the first two vectors of the principal components provided the two-dimensional distribution of this population sample based on the evaluated traits.
The biplot scatterplot showed 61.9 % of the total variation explained by the first two principal components (Figure 3).This approach demonstrates the relationships of multiple traits and their weight in differentiating genotypes.It has been successfully used to examine relationships among traits and select superior genotypes in various crops, such as wheat (Dehghani et al. 2012), popcorn (Santos et al. 2017), common bean (Oliveira et al. 2018), papaya and mangaba (Hancornia speciosa) (Santana et al. 2021a, Santana et al. 2021b).This biplot approach has been successfully applied to examine relationships among traits as well as to select superior genotypes based on multiple traits of interest.
The biplot graphical visualization enables the assessment of associations among the evaluated traits by analyzing the cosine of the angle between the vectors, provided that the biplot accounts for a substantial portion of the total variation (Dehghani et al. 2012, Gravina et al. 2020).By taking the cosine of the angle between the vectors, the correlation coefficient between any two traits in Figure 3 can  be determined.Accordingly, the dry matter yield exhibited positive correlations with the other traits, particularly with plant height and number of tillers, whereas the dry matter percentage showed a stronger correlation with number of tillers and dry matter yield.Therefore, selecting genotypes based on these traits would lead to increased dry matter yield and dry matter percentage gains.
The quadrant 1 encompassed the dry matter percentage, number of tillers and dry matter yield traits, suggesting that genotypes falling within this quadrant could be chosen to obtain high-yielding plants.Notably, the genotypes G064, G061 and G089 stood out in this regard.The quadrant 2 included the stem diameter and plant height traits, with the genotypes G192, G182, G167 and G043 showing promise for these traits.Genotypes located in quadrants without associated traits are considered unsuitable for recommending as sources of alleles for the measured traits (Sabaghnia & Janmohammadi 2016, Santos et al. 2017).
To assess the dissimilarity among the genotypes and group them accordingly, a cluster analysis using the UPGMA linkage method (Figure 4) was performed.The resulting dendrogram enabled the classification of genotypes into 20 distinct groups, following the Mojena's method.Among the formed groups, ten consisted of a single genotype, indicating the highest level of divergence (Table 4).Ten groups exhibited mean values for dry matter yield above the overall mean (28.07 t ha -1 year -1 ) for this trait.These groups displayed noteworthy productivity, considering that the most productive cultivars available in the market yield between 27.87 (Cameroon) and 49.75 t ha -1 year -1 (BRS Capiaçu) (Neiva 2016).The most productive genotypes, namely G065, G142, G032, G089, G192, G061, G038, G060, G197, G140, G064, G067, G164, G034, G020, G093, G129 and G033, exhibit a high potential for selection and the continuity of the breeding program for bioenergy purposes.Regarding the plant height trait, 10 groups demonstrated mean values above the overall mean (2.66 m).For number of tillers, nine groups displayed mean values exceeding the overall mean (22.08).
The judicious selection of genotypes based on genetic distance has been an essential tool in plant breeding.The analyses conducted in this study confirm the variability among the 220 elephant grass genotypes belonging to 11 full-sib families.Promising genotypes from groups that exhibit simultaneously high dry matter yield and height, as well as number of tillers exceeding the overall mean (22 tillers), were identified.These traits hold a significant importance for the elephant grass utilization in bioenergy applications.CONCLUSIONS 1.The full-sib families of elephant grass exhibit variability in key traits related to biomass production for bioenergy purposes; 2. Dry matter yield was the trait that contributed the most to the differentiation of accessions; 3. The examined genotypes offer a valuable resource for achieving a high dry matter yield, with potential for developing new cultivars with enhanced bioenergy potential.

Figure 1 .
Figure 1.Boxplots for five quantitative traits of elephant grass full-sib families.

Figure 2 .
Figure 2. Scree plot (A) indicating the number of significant principal components and representation of the explained variance (B)for elephant grass full-sib families.

Table 1 .
Full-sib families obtained from directed crosses.

Table 2 .
Descriptive statistics for quantitative traits of 220 genotypes belonging to 11 elephant grass full-sib families.

Table 3 .
Factor loadings, eigenvalue, relative variance and cumulative variance in the principal components of five quantitative traits in full-sib families of elephant grass.

Table 4 .
Number of genotypes and families represented in the 20 groups obtained by dendrogram hierarchical clustering.