The use of different clustering methods in the evaluation of genetic diversity in upland cotton 1

The continuous development and evaluation of new genotypes through crop breeding is essential in order to obtain new cultivars. The objective of this work was to evaluate the genetic divergences between cultivars of upland cotton (Gossypium hirsutum L.) using the agronomic and technological characteristics of the fibre, in order to select superior parent plants. The experiment was set up during 2010 at the Federal University of Ceará in Fortaleza, Ceará, Brazil. Eleven cultivars of upland cotton were used in an experimental design of randomised blocks with three replications. In order to evaluate the genetic diversity among cultivars, the generalised Mahalanobis distance matrix was calculated, with cluster analysis then being applied, employing various methods: single linkage, Ward, complete linkage, median, average linkage within a cluster and average linkage between clusters. Genetic variability exists among the evaluated genotypes. The most consistant clustering method was that employing average linkage between clusters. Among the characteristics assessed, mean boll weight presented the highest contribution to genetic diversity, followed by elongation at rupture. Employing the method of mean linkage between clusters, the cultivars with greater genetic divergence were BRS Acacia and LD Frego; those of greater similarity were BRS Itaúba and BRS Araripe.


INTRODUCTION
Upland cotton (Gossypium hirsutum L. var.latifolium Hutch.)produces the most important textile fibre on the planet, accounting for 40% of all fibres produced (OZYIGIT, 2009).The cotton crop had a record harvest of 5.059 million tons of cotton (as seed) in 2011, 72.6% greater than in 2010.In the northeast of Brazil, which accounts for 34.53% of the area of the country, there was a decline in cotton acreage in all states, with the exception of Ceará, which registered an increase of 4.4% (CONAB, 2010).
Knowledge of the genetic diversity among a group of parent plants of any species is of great importance, especially in the identification of those hybrid combinations having a greater heterotic effect.When such combinations are taken as a basis, the probability of recovering superior genotypes in segregating generations is greater (ROTILI et al., 2012).The use of parent plants with low genetic diversity in the formation of populations for selection, reduces genetic variability, making the selection of superior genotypes difficult (CRUZ; CARNEIRO, 2003).
Predictive methods of genetic diversity have been widely used.Among them are those that quantify diversity by means of measurements of dissimilarity, one such being the generalised Mahalanobis distance (CARVALHO et al., 2003).The choice of the most appropriate method has been determined by the accuracy required by the researcher, the ease of analysis and the way in which the data were obtained.Several studies have been carried out in order to compare the methods used to measure dissimilarity and methods of clustering in various crops: wheat (BERTAN et al., 2006), beans (CARGNELUTTI FILHO et al., 2010) and maize (CARGNELUTTI FILHO;GUADAGNIN, 2011). Silva Filho et al. (2005) were able to verify genetic variability among evaluated cultivars of upland cotton, it being possible to identify divergent genotypes having good technological characteristics for their fibre, which, in turn, then receive greater attention by breeders for use in artificial crossings.This study therefore, aimed to evaluate the genetic diversity of cultivars of upland cotton based on several clustering methods, under the conditions found in Fortaleza, Ceará, Brazil.

MATERIAL AND METHODS
The experiment was carried out in the field, in the city of Fortaleza in the northern region of the Brazilian state of Ceará; on the Pici Campus of the Federal University of Ceará.Treatments consisted of 11 cultivars of upland cotton, all developed by the Breeding Program of Embrapa Cotton, with the exception of LD Frego (Table 1).
The climate in the region is Aw (tropical wet), according to the Köppen classification, with an average annual temperature of 26 ºC, a maximum of 34 ºC and a minimum of 21 °C.The mean annual rainfall is 1,600 mm with, the driest period in winter and maximum rainfall in the autumn (INMET, 2008).The soil of the experimental area is sandy, having the following chemical properties, according to the soil analysis shown below (Table 2).
Fertilization was carried out according to the soil analysis.The experiment was conducted in 2010, from April  to August, giving a cycle of 120 days.The experimental design used was of randomised blocks with three replications.
Each experimental lot consisted of two rows 2.5 m long, with 10 plants in each row, spaced 0.25 m x 0.70 m apart, with the 10 central plants from each lot being considered usable, eliminating those at the edges.A conventional tillage system was adopted, with the soil being prepared by ploughing followed by harrowing.Irrigation was carried out by sprinkler according to the water requirements of the crop.Harvesting was done manually.
Firstly, univariate variance analysis was performed, with the aim of verifying the variability among cultivars.Subsequently, for analysis of genetic divergence, the genetic distance between the different pairs of genotypes was calculated, employing the generalised Mahalanobis distance (D ij ²) as a measure of genetic dissimilarity among the cultivars.To estimate this distance, the averages were computed for each of the variables for each cultivar, and then the residual covariance matrix was established, the data transformation matrix, the variance of transformed variables, the averages of uncorrelated variables, and finally the pivotal condensation technique for resolving the dispersion matrix.
From the dissimilarity matrix a cluster analysis was performed using the hierarchical methods of single linkage (nearest neighbour), of Ward, of complete linkage (furthest neighbour), of the median, the average linkage within a cluster and the average linkage between clusters, allowing dendrograms to be produced.To validate clusters, that is, to verify the ability of the dendrogram to reproduce the dissimilarity matrix, the cophenetic correlation coefficient (CCC) was calculated.The CCC is the Pearson correlation coefficient between the distance matrix (D ij 2 ) and the cophenetic matrix (C).All the statistical analyses were carried out using the GENES software (CRUZ, 2006).

RESULTS AND DISCUSSION
The reliability index is the characteristic that most contributed to the characteristic-diversity analysis at 23.33%, followed by fibre yield at 12.69% and average boll weight at 11.36%.This reliability index includes several technological characteristics of the fibre (Table 3).
The relative contribution of each characteristic to divergence is of great importance in identifying those characteristics having the highest contribution, and also in excluding those with the least (MISSIO et al., 2007).The variables, number of bolls, 100 seed weight, fibre length, short fibre index, reflectance and degree of yellowness, contributed the least, with values close to zero.These same variables were suggested for exclusion by the evaluation of divergence between the cultivars.This exclusion, according   The groups formed based on the Mahalanobis distance (Dij2) and the clustering methods used, were similar to each other in the dendrograms shown in Figure 1, where the cultivars are: BRS Cedro (1); BRS Aroeira (2); BRS Itaúba (3); BRS Araçá (4); BRS Ipê (5); BRS Acácia There were 2, 2, 2, 2, 4 and 3 clusters of genotypes formed respectively by the hierarchical methods of single linkage, complete linkage, Ward, median, average linkage within a group and average linkage between groups.The cultivars, CNPA Precoce 1 and LD Frego (Figure 1) formed an isolated group in almost all the clustering methods, except that of average linkage within a group where each of these genotypes made up a single cluster.Costa (2001) reports that the quantification of genetic diversity using the generalised Mahalanobis distance was effective to highlight at least one fairly divergent group formed by two accessions of upland cotton cotton, Del Cerro and Acala SJ-2.
In Table 4 are shown the cophenetic correlation coefficients between the generalised Mahalanobis distance matrix (D 2 ) and the hierarchical clustering methods.
The cophenetic correlation coefficient (CCC) between the generalised Mahalanobis distance matrix (D ij 2 ) and the cophenetic distance matrix (C), given by the dendrogram for each method, were of high magnitude and significant, ranging from 0.80 (CL) to 0.85 (SL, ME and ALBG), demonstrating consistency in clustering (Table 4).Cruz and Carneiro (2003) report that cophenetic correlation is a good criterion to evaluate the consistency of a graphical representation, with values close to one indicating better performance.
The drawback found in these methods, according Cargnelutti Filho and Guadagnin (2011), is that in general, the cophenetic correlation coefficients, given by combining the measurements of dissimilarity with the clustering methods SL, CL, ALBG and WARD, decrease with the increase in the number of cultivars and variables, affecting the consistency of the clustering and thereby presenting a degree of limitation.Thus, based on the generalised Mahalanobis distance (D ij 2 ), the methods that graphically represented the original matrix with greater consistency (with the greatest cophenetic coefficients) were ME, ALBG and ALWG.On the other hand, the CL method gave the worst representation, with the lowest coefficient.These results are similar to those found by Cargnelutti Filho et al. (2008), where the most consistent methods were ALBG, ME and SL.
The correlation coefficient between the clustering methods ranged from 0.88 (SL/CL) and 0.99 (ME/SL, WA/ CL, AWLG/ME, ALBG/ME and ALBG/ALWG).Among the methods evaluated, ALBG showed a higher correlation with the remaining methods (values ranging from 0.85 to 0.99), greater consistency in clustering (CCC = 0.85) and easier visualisation of the groups formed (Figure 1-F).
Considering the dendrogram given by the average linkage between groups -UPGMA (ALBG), it was possible to form three clusters.The first was formed by the cultivars BRS Cedro, BRS Ipê, CNPA ITA 90, BRS Aroeira, BRS Araçá, BRS Seridó, BRS Itaúba and BRS Araripe, the second by BRS Acácia and the third by CNPA Precoce 1 and LD Frego (Figure 1-F).These results are similar to those found by Silva Filho et al. (2005), who when studying the genetic diversity among commercial cotton cultivars (Gossypium hirsuntum L.), grouped BRS Cedro, BRS Aroeira and BRS Jatobá in the same group.The same authors showed that there was a tendency for cultivars to group together according to the companies to which they belong, thus demonstrating that each company has its own germplasm for breeding.
According to Menezes et al. (2008), the similarity between the cultivars BRS Ipê and CNPA ITA 90 is high, due to the first having been obtained through a selection within the second.Although there may be exceptions, there was high similarity between most of the genotypes tested, agreeing L. F. Araújo et al.
with other work already done on strains of upland cotton (BERTINI et al., 2006;MENEZES et al., 2008;RAHMAN et al., 2002;ULLAH, I. et al. 2012;ZHANG et al., 2005).According to McCarty et al., (2007), breeding programs for cotton present a narrowing of the genetic base, which requires detailed genealogical studies into the parent plants used in these programs in order to avoid related crossings.
With respect to the genetic dissimilarity observed among the 55 pairs of cultivars, the highest value was found between the cultivars LD Frego and BRS Acácia (1172.28),followed by BRS Itaúba and LD Frego (929.58) and BRS Acácia and CNPA Precoce 1 (860.64).Those of greatest similarity were BRS Itaúba and BRS Araripe (23.01), followed by BRS Araça and BRS Seridó (26.08).The greater dissimilarity between cultivars indicates more divergence and the higher segregation of progenies originating from crossings between these genotypes.
Based on the dendrogram given by the average linkage between groups (ALBG) (Figure 1-F) and those given by the dissimilarity matrix, the cultivars, BRS Acácia and LD Frego, (1172.28)appear be the most genetically.These cultivars showed the greatest genetic distance, being in very distinct groups on the dendrogram.
Based on the results obtained by divergence analysis, the pairs of cultivars most suitable for crossbreeding and obtaining segregating populations are: 1) BRS Acácia and LD Frego, the most divergent cultivars by the analysis of genetic divergence.BRS Acácia presented good technological characteristics for its fibre, such as high uniformity (88.03%) and fibre length (32.13 mm), both significant variables for the consumer market in cotton.On the other hand, the variety LD Frego has good agronomic characteristics, such as a high percentage of fibre (42.36%).The crossing of these cultivars may therefore result in populations with high levels of segregation, making it possible to obtain through selection new cultivars having good agronomic and technological characteristics for their fibre, in addition to not being a preference of the boll weevil, a characteristic afforded by its frego bracts.
2) BRS Itaúba and CNPA Precoce 1. BRS Itaúba presented good fibre quality, with the second best reliability index (3092.33).CNPA Precoce 1 had the highest reflectance (78.30), this characteristic being of great importance to the cotton market, which prefers whiter fibres.

CONCLUSIONS
1.The characteristics that contributed most to genetic divergence were the average boll weight, elongation at rupture and micronaire index; 2. The LMEG clustering method, based on the generalised Mahalanobis distance, was the most consistent in the assessment of the genetic diversity of genotypes; 3. The cultivars, BRS Acácia and LD Frego, showed the most genetic divergence.Those having the greatest similarity were BRS Itaúba and BRS Araripe; 4. The recommended crossings for obtaining genotypes with greater diversity are BRS Acacia with LD Frego and BRS Itaúba with CNPA Precoce 1.

to
Alves et al. (2003), should reduce the work, time and cost spent on experimentation.According to Rotili et al. (2012), a higher relative contribution by the characteristic of productivity is important in the study of populations when selecting those which are most divergent.

Figure 1 -
Figure 1 -Hierarchical dendrograms obtained from the generalised Mahalanobis distance using the clustering methods of single linkage (A), complete linkage (B), Ward (C), median (D), average linkage within a group (E) and average linkage between groups (F) for 11 genotypes of upland cotton.Fortaleza, Ceará, 2010

Table 2 -
Chemical properties from the soil analysis of Fortaleza, Ceará, 2010

Table 4 -
Cophenetic correlation coefficient between the generalised Mahalanobis distance matrix and hierarchical clustering methods, and between the individual hierarchical clustering methods.The lower diagonal shows the coefficients obtained from the generalised Mahalanobis distance D ij 2 , itself calculated for 11 genotypes of upland cotton.Fortaleza, Ceará, 2010