Determining genetic diversity in cotton genotypes to improve variability

Submitted on September 25 , 2019 and accepted on September 4 , 2020. 1 This work is part of the Master’s Dissertation of the first author. 2 Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Departamento de Genética e Melhoramento de Plantas, Piracicaba, São Paulo, Brazil. melissamiranda94@gmail.com 3 Universidade Federal de Uberlândia, Instituto de Ciências Agrárias, Uberlândia, Minas Gerais, Brazil. danieludia13@hotmail.com; thatianepaiva31@hotmail.com; larissa@ufu.br 4 Embrapa Algodão, Centro Nacional de Pesquisa de Algodão, Campina Grande, Paraíba, Brazil. francisco.farias@embrapa.br *Corresponding author: melissamiranda94@gmail.com Determining genetic diversity in cotton genotypes to improve variability 1


INTRODUCTION
Cotton is one of the most recent and most promising agricultural sectors in Brazil. The 2019/2020 harvest was the fourth largest in the world (Conab, 2020). The success of cotton cultivation in Brazil is largely due to investments in research, production technology and domestic breeding programs.
The continued development of cultivars is dependent on making the most of available genetic resources. Because of numerous species and varieties, genetic variety within the Gossypium genus is vast (Vidal Neto & Freire, 2013). However, to produce genotypes with favorable agronomic and technological traits, breeders frequently choose successful cultivars as parent plants, which narrows the genetic base and leads to high levels of homogeneity across large areas of cropland (Mehboobur-rahman, 2012;Borém & Miranda, 2013).
This homogeneity poses risks such as greater vulnerability to biotic and abiotic stress and restricted variability in quantitative characters (Cruz et al., 2011;Borém & Miranda, 2013). Thus, breeding programs need a broad genetic base to build diverse populations with satisfactory agronomic performance that can be used to obtain genetic gains and to identify hybrids with greater heterotrophy (Cruz et al., 2011;Ludke et al., 2017).
Most research on genetic divergence in cotton focusses only on quantitative characters, such as productivity, yield and fiber quality, while disregarding the underlying phenotypic characters. However, it is important to integrate all possible variables to avoid segmentation in inferences about genetic divergence between genotypes and provide a better basis for evaluating strategies that improve variability and lead to better decisions in breeding programs . Therefore, the present study evaluates the genetic diversity of cotton genotypes used in the top breeding programs in Brazil and recommends the best hybrid combinations for increasing variability.

MATERIAL AND METHODS
The experiment was carried out in the field using the 2017/18 harvest in Uberlândia, Minas Gerais Brazil (18º 52' S and 48º 20' W, elevation 805 meters).
The three most-cultivated cotton genotypes (Gossypium hirsutum L.) were selected from each of the five top breeding programs in Brazil and the breeding program at the Universidade Federal de Uberlândia -UFU (PROMALG) ( Table 1).
The trial was set up using a randomized completely block design (RBD) with 18 treatments and three repetitions. Each experimental plot consisted of four rows of cotton, spaced one meter apart and four meters long. Only the two centermost rows from each plot, less half a meter from each end of the rows, were evaluated.
Agronomic aspects of five randomly sampled plants from each plot were evaluated using the cotton scale (Escala do Algodão in Portuguese) proposed by Marur & Ruano (2001). These evaluations were carried out at (V5), appearance of the first boll (B1), full flowering (FF) and full maturity (MAT) ( Table 2).
Yields of seed cotton, cotton seed plus lint and lint were determined after evaluating the plants at full maturity. In addition, standard samples (SS) from each lint plot were evaluated using High Volume Instruments (HVI) to determine indicators of fiber quality such as upper-half mean fiber length (UHM), the fiber length uniformity index (LU), short fiber index (SFI), fiber strength (STR), fiber elongation (ELG), micronaire index (MIC) and fiber maturity (MAT).
The resulting data from the 18 genotypes and 28 variables were evaluated by univariate and multivariate analyses using the GENES software package (Cruz, 2013). Univariate analysis was performed first and, depending on significance, the Scott-Knott test was then used to determine genotype groups (p < 0.05 and p < 0.01).
Several genotypic parameters were estimated, such as the genotypic determination coefficient (h 2 ) and the ratio of CVg to CVe (CVr). Phenotypic (rf) and genetic (rg) correlations were determined for characters with the highest CVr and h 2 values.
Genetic dissimilarity was calculated using the generalized Mahalanobis distance (D 2 ii') (Cruz et al., 2011). Then genotype clusters were determined using Torcher optimization and UPGMA clustering and displayed as dendrogram.
Dendrogram clusters were chosen using the method proposed by Mojena (1977) and adapted by Milligan & Cooper (1985). The cophenetic correlation coefficient (Mantel, 1967) was calculated to check the quality of UPGMA clustering. Finally, the relative contribution of each character to genetic divergence was quantified using the criterion proposed by Singh (1981).

Analysis of the phenotypic characters of the cotton genotypes
Base populations can be studied to select parent plants that would provide promising crosses, genetic diversity and satisfactory agroeconomic performance (Cruz et al., 2011). Thus, the mean genotype values of the characters were grouped in order to evaluate the potential of each cultivar. Lint yield ranged from 35.67% to 45.84% (Table 3). UFUJP-P, UFUJP-H and BRS 433 FL B2RF had the lowest yields while FM 980 GLT, TMG 47 B2RF and IMA 8405 GLT had the highest values, which were greater than the 40% target used in breeding programs (Vidal Neto & Freire, 2013). Carvalho et al. (2003) found lint yields from 26.4% to 41.7% for genotypes in the EPAMIG germplasm bank, whereas Santos et al. (2017) found lint yields from 41.37% to 45.56% for cultivars from various breeding programs.   The genotypes did not cluster by cotton seed yield, which may be due to high CV values. However, two and three groups were found for seed and lint yield, respectively, with TMG 45 B2RF producing the highest values (Table 3). DP 1552 B2RF, FM 982 GLT, TMG 45 B2RF, TMG 47 B2RF and TMG 82 WS produced higher yields than the averages for the 2017/18 crop in Minas Gerais, Brazil (3,966 kg ha -1 of seed cotton, 1,586 kg ha -1 of lint and 2,380 kg ha -1 of seed) (Conab, 2018).
Globally, cotton bolls average 61.5% seeds and 38.5% fiber and in general, larger seeds indicate lower fiber quantities (Beltrão, 2001). This assertion was also determined by Fang et al. (2017) who showed that cotton plants contain two gene loci that control the pleiotropic and inverse association of fiber percentage with seed index and seed size. Most of the seed cotton yield from the PROMALG UFU genotypes was due to superior seed yield. Specifically, the percentages of seed weight to total weight for UFUJP-H, UFUJP-P and UFUJP-B were 65.20%, 68.53% and 64.54%, respectively, which were higher than those of the other genotypes.
Intrinsic fiber quality is cultivar specific and although influenced by environmental conditions, is mainly controlled by genetic factors (Freire, 2015). Understanding associations among characters is essential in choosing breeding strategies. Therefore, this paper analyzes correlations among characters and discusses mean indicators of fiber quality, where CVg/CVe and h 2 values were the highest.
Genotypic correlations were greater than or equal to phenotypic correlations in 91.67% of the pairs of characters, suggesting that genetic factors contributed more to these correlations than did environmental factors (Table 4). Phenotypic correlations have greater practical value since selections are generally based on phenotype (Cruz et al., 2012). The strongest and most significant phenotypic correlations were between the short fiber index and length uniformity (-0.92), maturity and micronaire (0.88), and resistance and length (0.76).
Four groups were formed according to fiber quality, with mean fiber lengths (UHM) varying from 27.47 to 32.08 (Table 5). Only TMG 45 B2RF had medium-length fibers, while the rest were classified as having long fibers, which is the goal of breeding programs. BRS 433 FL B2RF had the longest fibers (32.08 mm), which were greater than those proposed by Embrapa (2002) and classified as extra-long according to Cotton Incorporated (2018).
BRS 433 FL B2RF produced the greatest average fiber length but lower lint yield. Similar results were found by Carvalho et al. (2015), who found a negative correlation between yield and fiber length in extra-long fiber cultivars, but many breeders break this linkage (Kennedy, 2018). However, given that only BRS 433 FL B2RF had extralong fibers (Embrapa, 2002), no correlation could be shown between these characteristics. Santos et al. (2017) state that some genotypes may have both higher average fiber yields and fiber lengths, as was observed in FM 980 GLT, IMA 8405 GLT and TMG 47 B2RF. UHM is correlated with resistance (STR). In other words, longest fibers more strength and fineness. The same was found in the present study and evidenced by a strongly positive correlation (rf = 0.76). The correlation data also showed that STR is directly related to the length uniformity index (rf = 0.63) and micronaire (rf = 0.53), and inversely associated with the short fiber index (rf = -0.63). These correlations are important for cotton breeding since they show that the selection of only one of these characters, preferably the one with the highest coefficient of genotypic determination (UHM), favors the simultaneous selection of the other characters.
The average STR values of these genotypes were ideal (greater than 27 gf tex -1 ) and could be separated into four groups. The BRS 433 RF B2RF genotype alone classified as very strong (34.92 gf tex-1), while the others were classified as highly resistant (greater than 28.09 gf tex -1 ).
A lower short fiber index (SFI) is desirable since shorter fibers perform better in processing and result in better quality yarns (Santos et al., 2017;Cordão Sobrinho et al., 2015). BRS 433 FL B2RF had the lowest mean SFI (5.72%) of the group and was classified as very low (Embrapa 2002). Negative correlations between SFI and UHM (rf = -0.58), SFI and STR (rf = -0.63), SFI and LU (rf = -0.92) showed that genotype BRS 433 FL B2RF had the highest average resistance, length uniformity, and elongation (very high), which suggest that the fibers from this cultivar would suffer limited damage during processing. The same characteristics were found in genotypes studied by Santos et al. (2017).
The genotypes could be separated into four groups according to fiber elongation. DP 1552 and BRS 368, DP Delta Opal, TMG 45, UFUJP-H, UFUJP-P and UFUJP-H produced the highest averages (greater than 7.6%) and were classified as having very high elongation. The mean elongation values of all genotypes, except TMG 47 B2RF (medium elongation, 6.74%), showed that the fibers should withstand the high accelerations of processing without rupturing (Belot, 2018).
Three groups were formed among the genotypes regarding uniformity of length (LU). The fiber of the group composed of BRS 433 FL B2RF (85.00%) and DP 1552 B2RF (84.92%) had higher LU values and was classified as uniform, which is better for spinning (Lana et al., 2014). The other groups had medium to uniform LU values that were close to the 83% target of breeding programs (Cunha Neto et al., 2015). The micronaire index of the genotypes varied from 3.05 to 4.15. Except for DP 1552 B2RF (4.15), all genotypes had fine fibers that fell within the optimal range defined by breeding programs (3.6 to 4.2 mm) (Freire, 2015).
Although micronaire is important, it should not be considered in isolation given its correlation with MAT (rf = 0.88). Thus, fibers with low MIC and high MAT values are the most desirable since they produce fine, strong yarns and fabrics (Cunha Neto et al., 2015). In addition to the micronaire index, lint yield (rf = 0.34) and fiber strength (rf = 0.66) were also significantly correlated with maturity.
Although two groups were formed regarding fiber maturity, all genotypes were classified as below average according to Embrapa (2002). However according to Santana et al. (2008), maturity values greater than 80%, as was the case for all genotypes in this study, indicate that the fibers are capable of maximizing dye uptake and retention (Belot, 2018).

Genetic divergence
Fiber maturity provided the highest relative contribution (36.57%) to genetic divergence (Singh, 1981), followed by micronaire (25.61%) and fiber elongation (8.31%) (Figure 1). In contrast, Cunha Neto et al. (2015) evaluated divergence among genotypes from the same breeding programs with white and colored fibers and found that the technological characteristics of the fiber contributed least to genetic diversity, while fiber yield and percentage contributed the most.
LU, SD / FF, SD / MAT, MD / B1, NR / MAT, NR / FF, SFI, LA / FF contributed little to the detection of genetic diversity. The lower genotype determination coefficients and lower CVg /CVe ratios explain the reduced importance of most of these characters, but not of SFI, MD / B1 and LU, which may have resulted from variations already represented by other characters (Cruz et al., 2011).
The relative importance of character data showed that the lint quality characters were more important than the agronomic variables (Figure 1). This may have occurred because these cultivars came from breeding programs and have already reached advanced levels for yield and production but are still evolving in quality. Fang et al. (2017) performed genomic analysis and detected more gene loci associated with lint yield than with fiber quality. This suggests that breeding for higher lint yield has been the main emphasis of cotton breeding over time.
The dendrogram based on UPGMA clustering and the Mahalanobis generalized distance (Figure 2) shows that the genotypes are separated into six clusters. Torcher clustering also separated the 18 genotypes into six distinct clusters (Table 6).
Combining the UPGMA and Tocher methods guarantees good estimates of genetic divergence (Gilio et al., 2017). These methods appear to be in partial agreement since they both produced the same number of clusters; however, the constituents of these clusters do differ somewhat due to the different ways of calculating genetic dissimilarity and of defining proximity between an indivi-  Rev. Ceres, Viçosa, v. 67, n.6, p. 464-473, nov/dec, 2020 dual and an existing group or between any two groups (Buttow et al., 2010;Cruz et al., 2011).
All clustering methods distributed the Embrapa, TMG and Fibermax cultivars into three clusters, except the Torcher method which distributed the Fibermax genotypes into two groups. The IMA cultivars separated into two clusters regardless of method; however, the composition of these clusters did vary by method. The consistent distribution of these cultivars suggests similarities within the breeding programs.
Regardless of clustering method, the PROMALG UFU genotypes were always found within a single cluster, while only one of the Monsanto DeltaPine cultivars (DP 1552 B2RF) was found in a separate cluster, indicating genetic similarity within these breeding programs.
Although the PROMALG UFU genotypes were arranged in a cluster with other cultivars by the Tocher optimization method, they were isolated by the UPGMA method. This differentiation occurred because the UFU genotypes were bred by crossing Gossypium hirsutum L. with Gossypium barbadense L. in order to incorporate the excellent fiber quality of G. barbadense L. in G. hirsutum L. cultivars.
UPGMA clustering and the Tocher optimization method grouped DP 1552 B2RF and BRS 433 B2RF separately from the other genotype, while BRS 433 B2RF was grouped separately by all clustering methods because of its dissimilarity to the other genotypes. The distance between these cultivars and the rest may be due to superior fiber quality (Table 4), especially BRS 433 B2RF, and lint yield (DP 1552 B2RF) (Table 3).
In general, the largest clusters consisted of genotypes from the greatest number of different breeding programs. Bertini et al. (2006) also identified clusters of cultivars from various breeding programs, suggesting that different programs share similar germplasm. In addition, Amalraj (1982), Singh & Gill (1984) found that no relationship between genetic and geographic diversity in cotton, since varieties from the same geographical origin could be found in different groups. They attributed this phenomenon to the selection and adaptation of populations. So, breeders should be depending on genetics rather than geographical distribution.
Dissimilarity within the clustering methods demonstrates divergence among these genotypes. However, at least one cultivar was similar across all the breeding programs. These cultivars should not be used in future crosses to maintain genetic variability and ensure selection gains. Genetically related parents tend to share many genes or alleles and produce crosses with low levels of allelic heterozygosity and consequently low levels of vigor (Cruz, 2012).
Proximity between genotypes suggests that similar germplasm sources with the same alleles have been used to breed Brazilian cotton cultivars, which have also been influenced by varieties from the United States and Australia. According to Gutiérrez et al. (2002), these varieties have a narrow genetic base that has been selected from existing cultivars, while underutilizing wild germplasm (Penna, 2005;Iqbal et al., 2001;Bertini et al., 2006). Bertini et al. (2006) found that many cotton cultivars are descended from a few parents and states that new alleles need to be introduced into cotton breeding.
The mean Mahalanobis distances (Table 7) were highest in BRS 433 B2RF and FM 980 GLT, demonstrating  Mojena (1977). Cophenetic correlation coefficient (r): 0.60. Rev. Ceres, Viçosa, v. 67, n.6, p. 464-473, nov/dec, 2020  that these genotypes were the most divergent among those tested. Direct comparison between BRS 433 B2RF and FM 980 GLT shows that the fiber quality of BRS 433 B2RF was greater than the other genotypes in this study and greater than Brazilian standards. Morello et al. (2017) also found that the fiber quality of BRS 433 FL B2RF was among the best of the cultivars grown in Brazil.
Hybridization between clusters is more efficient than within cluster to produce better progenies. This due to genetic dissimilarity is lower within clusters than between clusters.
Conversely, FM 980 GLT showed better production results (MD/B1, NN/MAT), especially lint yield, but lower fiber quality. According to Zeng et al. (2018), there may be a negative relationship between fiber yield and fiber quality.
Understanding the distance between genotypes is important for choosing the parents used in breeding since hybridization choices should be based on the magnitude of dissimilarities (Santos et al., 2017). Thus, in order to obtain genetic gains and to make the most of genetic variability among this set of cultivars, hybridizations between BRS 433 B2RF and FM 980 GLT could yield segregate populations with higher productive potential, lint yield and fiber quality.
BRS 433 B2RF was isolated from the other clusters, was generally quite divergent from the other genotypes and was least distant from UFUJP-B (234.41), UFUJP-P (241.80) and UFUJP-H (249.31). This probably resulted from crossing of germplasm with greater genetic variability, which provided genetic gains and generated recombinants. This in turn allowed the selection of the best genetic combinations and explains the excellent fiber quality of BRS 433 B2RF.
Crosses with FM 980 GLT should improve the genetics of the breeding program at UFU since this cultivar is genetically distant and has strong agronomic characteristics, good fiber quality and high lint yield that could improve the proportion of fiber and seed in the cotton bolls of PROMALG's genetic material.
Genetic diversity among the cotton genotypes in this study ranged from 68.64 to 1121.27 (Table 7). This amplitude echoes that of Araújo et al. (2014) who found distances of 23.01 to 1172.28 while evaluating the fiber quality and productivity of 11 cultivars resulting only from Embrapa's breeding program. These results indicate the limited genetic base of current cultivars.
Furthermore, the relationship between the highest and lowest value of D 2 in the present study was 16.34. This suggests moderate genetic divergence according to Nardino et al. (2017) and Paixão et al. (2008) who classified values of 33.6 and 39.0, respectively, as wide variability among corn genotypes. These findings show that cotton breeding programs in Brazil need to make better use of genetic resources.

CONCLUSIONS
Moderate genetic divergence was found among genotypes from the main cotton breeding programs in Brazil.
To make the most of the genetic variability among these cultivars, hybridizations between BRS 433 B2RF and FM 980 GLT should generate segregate populations with higher productive potential, lint yield and fiber quality. Crosses between the genotypes developed by PROMALG and FM 980 GLT could increase fiber yield and quality of UFUJP-B, UFUJP-H and UFUJP-P.
Fiber quality, maturity, micronaire and fiber elongation contributed more to detecting genetic divergence than did the other characters evaluated in this study.
The fiber quality traits had higher heritability more than yield components. Therefore, selection for fiber traits is facilitated and can be done in the early generation. While for yield traits, the selection should be in an early generation or direct in late generations.