Morphological phenotypic dispersion of garlic cultivars by cluster analysis and multidimensional scaling

Multivariate techniques have become a useful tool for studying the phenotypic diversity of Germplasm Bank accessions, since they make it possible to combine a variety of different information from these accessions. This study aimed to characterize the phenotypic dispersion of garlic (Allium sativum L.) using two multivariate techniques with different objective functions. Twenty accessions were morphologically characterized for bulb diameter, length, and weight; number of cloves per bulb; number of leaves per plant; and leaf area. Techniques based on generalized quadratic distance of Mahalanobis, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering, and nMDS (nonmetrric MultiDimensional Scaling) were applied and the relative importance of variables quantifi ed. The two multivariate techniques were capable of identifying cultivars with different characteristics, mainly regarding their classifi cation in subgroups of common garlic or noble garlic, according to the number of cloves per bulb. The representation of the phenotypic distance of cultivars by multidimensional scaling was slightly more effective than that with UPGMA clustering.


Introduction
Several countries, including the United States, the United Kingdom, Japan, Germany, Italy, Australia, Chile, the Czech Republic, and Brazil, have created Germplasm Banks for the conservation of garlic accessions (Allium sativum L.) due to limited species recombination (Matus et al., 1999, Stavelíková, 2008).The phenotypic variability in these banks should be quantifi ed in order to allow an effective use of accession collections, and this warrants the need for studies on the morphological characterization of cultivars.
In studies of the genetic divergence of garlic cultivars, UPGMA (Unweighted Pair Group Method with Arithmetic mean) clustering has been used by Mota et al. (2004), Vieira and Nodari (2007), and Buso et al. (2008).However, the objective functions of agglomerative hierarchical methods, such as UPGMA, are conditioned to the hierarchical property of clusters, which may not actually exist.Therefore, multiple techniques are required to make robust inferences about phenotypic variability.
Multi-Dimensional Scaling (MDS) is an ordering technique for dimensional reduction that allows one to map individuals as points in low-dimensional space (generally 2D or 3D) (Manly, 2004;Borg and Groenen, 2005).MDS is especially useful when the relation between individuals is unknown (not an unusual case in germplasm banks), but it is possible to estimate a detachment matrix between them.
In accessions of cocoa, Leal et al. (2008) used a hierarchical method together with an MDS solution to identify accession diversity.Studying genetic variability of Tibouchina papyrus populations, Telles et al. (2010) based their conclusions on the results of UPGMA and MDS analyses.No studies were found in the literature that examined the phenotypic relations of garlic cultivars using the tools of cluster analysis and MDS mapping.
This study aimed to characterize the phenotypic dispersion of garlic cultivars, based on morphologic characteristics, using two multivariate techniques with distinct objective functions.

Materials and Methods
The following morphological variables were used: bulb diameter (DIAM), bulb length (BL), average bulb weight (BW), number of cloves per bulb (NCB), number of leaves per plant (NLP), and leaf area (LA), obtained from readings of a leaf area integrator.The data were obtained from an experiment comprised of twenty garlic accessions.Accession origins and popular names are described in Table 1.
The experiment was carried out in 2010 in an experimental area in Viçosa, state of Minas Gerais, Brazil (20º45' S; 42º51' W; average altitude of 650 m).The experimental design was a completely randomized block with four replications.The experimental units consisted of four longitudinal rows 1.0 m long, spaced at 0.25 cm, with plants every 0.10 cm, totaling 40 plants per experimental unit.Plants in the two central rows were considered useful, and those located 0.10 cm from each edge were excluded.After harvesting, plants were submitted to the curing process in the fi eld and in warehouses for 3 and 60 days, respectively.After the fi eld curing process, plants had their aerial parts removed 0.10 cm above the bulbs and their roots entirely removed.The variables NLP and LA were measured immediately after harvesting, while the others were evaluated after the warehouse curing process.
Based on the morphological phenotypic characteristics described, the detachment among accessions was measured by two multivariate techniques: cluster analysis and multidimensional scaling, described as follows.
Prior to the cluster analysis, the residual variances and covariances between variables were estimated.We calculated the means of each accession for each variable.Based on the estimates, the generalized quadratic distance of Mahalanobis was adopted as a dissimilarity measure between the accessions to carry out the clustering by UPGMA in accordance to the algorithm proposed by Lance and Williams (1967) to update detachments between entities during the clustering process.This is shown in equation 1: (1) where i and j are newly clustered entities and k is another entity to be clustered.To reach the objective function of UPGMA method, the following parameterization is used (equation 2): (2) where  i ,  j ,  and  are the required parameters for the algorithm; n i and n j show the number of accessions in entities i and j, respectively.
The Mojena (1977) method, based on the relative size of the fusion levels in the dendrogram, was used to indicate the cut-off point.The purpose was to select the number of groups in stage j that, primarily, complete  j > , where  j is the fusion level corresponding to stage j (j = 1, 2, ..., g -1),  is the reference cut-off point given by: k , where  and ˆ  are, respectively, the mean and standard deviation of  values; k is a constant whose value to be adopted, as suggested by Milligan and Cooper (1985), is k =1.25 as a cut-off point for the defi nition of the number of groups.
The generalized quadratic distance of Mahalanobis also allowed us to quantify the relative contributions of each of the p = 6 characters to phenotypic detachment using the criterion proposed by Singh (1981), based on S j (j=1, 2, ..., p) statistics.In this case, it was considered that the generalized quadratic distance of Mahalanobis between accessions i and i' can be explained by equation 3: ( 3 ) And, for this distance, the contribution of the j-th character is given by equation 4: where  jj , is the element of the j-th line and j'-th inverse column of the matrix for residual variances and covariances, and d j represents the difference between accession i and accession i' for the j-th character.Then, the value of S .j is obtained by the contributions of the j-th character at g(g-1)/2 detachment between accessions.
The second multivariate analysis technique was based on nonmetric multidimensional scaling (nMDS) where only information about the order of dissimilarities was used to obtain a geometric representation.In this case we used the representation of bidimensional space of the generalized detachment matrix of Mahalanobis.Formally, nMDS is defi ned by a monotonic function   ij ij f : d d whose objective is to represent the information of dissimilarity ordering of g accessions, such as , where d ij ,  ij d , and  ij represent, respectively, the dissimilarity, the disparity, and the mapping error for accessions i and j in a particular number of dimensions.A dispersion diagram was used to show the results of the nMDS solution.The analysis of adjustment level in two dimensions of multidimensional space was observed using the formula (equation 5) of Kruskal's Stress 1 : (5) 48 % of the last fusion level (Figure 1).Based on this information, we identifi ed three clusters of accessions, two of which consisted of a single accession (BGH-5956 and BGH-6389).Therefore, one cluster accounted for 90 % of the accessions.Considering the fusion level of this cluster of approximately seven in the dissimilarity measure, this means that circa 35 % of the dissimilarity (or 65 % of similarity) of the accessions within the cluster is in relation to the last fusion level.
The stress value of the nMDS solution with representation in two dimensions was 11.5 %.According to the table proposed by Sturrock and Rocha (2000), at this stress value there is a 1 % probability that the accessions are randomly arranged in bidimensional space.The results of this solution are shown in Figure 2. Dimension 1 shows a great distance between accession BGH-5956 and the other 19 accessions.Likewise, dimension 2 primarily refl ects the phenotypic divergence of accession BGH-6389 in relation to the others.Therefore, accessions BGH-5956 and BGH-6389 are atypical points.In a study carried out by Menezes Sobrinho et al. (1999), the cultivar Gigante Lavínia (BGH-5956) was also allocated to a cluster different from the other accessions, as in our study.
The nMDS solution in two dimensions provided results similar to those obtained from the cluster method.However, with the geometric representation of the distances, it is possible that the formation of four clus-The table of stress values proposed by Sturrock and Rocha (2000) was used to validate the result obtained with the nMDS solution by comparing the value obtained with the stress value generated from random matrices with the same number of objects and the same number of dimensions (twenty and two, respectively, in this study).
A graphic representation of the original detachment matrices, cophenetic distances obtained from UPGMA method and disparities, all ordered according to the number of groups, was made to visualize the degree of dissimilarity (or similarity) within and between groups.To standardize representation to the scale [0, 1], the detachments were divided by the maximum distance in each matrix.We also calculated the correlations between matrices of original distances, cophenetic distances, and disparities.
The matrix for residual variances and covariances was obtained from the 'Experimental Statistics' module and distances between accessions were calculated by the 'Multivariate Analysis' module, both part of Genes software version 2009.7.9 (Cruz, 2006).The cluster analysis was carried out with the 'hclust' function of the stats package, nonmetric multidimensional scaling with the 'isoMDS' function of the MASS package, and the graphic representation of detachment matrices with the 'image' function of the graphics package.All of them are packages of R software version 2.15.2 (R Core Team, 2012).

Results and Discussion
In the Mahalanobis distance, accessions BGH-5940 and BGH-5956 had the greatest divergence (26.73, data not shown), and accessions BGH-5946 and BGH-5948 had the smallest distance (0.22, data not shown).More than 60 % of these divergences are attributed to differences between productive characteristics of plants, namely number of cloves per bulb (NCB) (43 %, data not shown) and average bulb weight (BW) (20 %, data not shown), in relation to the measurement of relative importance proposed by Singh (1981).These results are in accordance with those obtained by Menezes Sobrinho et al. (1999) who, using principal components and canonic variables for the morphological characterization of garlic germplasm, reported that NCB and BW are among the ten features evaluated that most contributed to the morphological distinction of accession groups of the Germplasm Bank in Brasília, Federal District, Brazil (15º47' S; 47º55' W).In a study on garlic diversity based on morphological features, Panthee et al. (2006) used cluster analysis and principal components analysis and observed that BW, DIAM, BL, and NCB were the characteristics that most contributed to the diversity of the 179 accessions.
Based on the method of Mojena (1977), we applied a cut-off point to the dendrogram of the UPGMA method at a dissimilarity of 9.2, which corresponds to ters may be more appropriate, according to the separation caused by the shades in Figure 2.Such formation indicates an average within-cluster distance (for the two clusters with more than one accession) of approximately four, which represents less than 20 % of the higher disparity.If a cut-off in the dendrogram were made allowing the formation of fi ve clusters, the exact same clusters identifi ed with the nMDS solution would be made and the fusion level of these clusters would represent approximately 21 % of the last level.It is apparent, therefore, that the Mojena method was not very strict, given that the criterion used to determine the number of clusters was infl ated by the high fusion level in the accession BGH-5956 in relation to the others, because the variance of fusion levels is more affected than the average.The most appropriate cluster formation can be visualized graphically by the matrix representation of the dissimilarity level between the accessions based on the original distances ordained in accordance to the k number of clusters.A good representation originates a good block-diagonal structure, i.e., an ideal cluster has accessions that present dissimilarity equal to zero for all accessions of this cluster and equal to one for all accessions of the other cluster.Figure 3 was used as the basis for cluster formation, and k = 3, k = 4, and k = 5 were used.The colors used to represent dissimilarity level between accessions ranged from black (no dissimilarity) to white (highly dissimilar).The formation of four clusters made it possible to identify clearly the more similar accessions, showing a block-diagonal structure.In addition, the blocks showed an internal dissimilarity level lower than 20 % of the maximum dissimilarity.
Representation of original distances is more accurately reached with disparities than with cophenetic distances (Figure 3).Indeed, once in the cophenetic matrix, the distance between any accession of one cluster to an accession of another cluster is the same, i.e., the fusion level of clusters.The result of the nMDS solution, i.e., the disparity matrix strongly correlated (0.96) with the matrices for original distances, and the correlation of the cophenetic matrix was higher (0.86) (Table 2).Thus, both techniques were correlated (0.88).
Figure 4 shows the average of the four clusters formed by the garlic accessions for the morphological characters of higher relative importance (NCB and BW).Cluster 1, containing seven accessions, represents accessions that can be used as pertaining to the common subgroup, according to the olericultural classifi cation of Embrapa (Luengo et al., 1999).Of the nine cultivars from the municipality of Viçosa (Minas Gerais, Brazil), two (Cateto Roxo and Amarante Aimorés) were located in Cluster 1. Cluster 2, containing eleven accessions, was characterized by noble cultivars with average BW.
Not only did accession BGH-5956 had higher average BW, it also presented high NCB, which is an undesirable characteristic from the commercial perspective.Oliveira et al. ( 2010) also observed that the Gigante Lavínia cultivar (BGH-5956) stood out for showing higher BW (25.2 g), which is far below the value found in the current study (47.4 g).This can be obviously explained by edaphoclimatic differences.On the other hand, accession BGH-6389 can be considered an ideal cultivar for the market, given its high BW and an NCB around 15 (characteristic of noble garlic).For the Amarante, Cateto Roxo and Caturra cultivars, located in Cluster 2 in the current study, Oliveira et al. (2010) obtained results (except for environmental interference) consistent with BW averages ranging from 16 to 20 g, approximately, and NCB of around 10 to 15.

Conclusions
The two multivariate techniques with distinct objective functions made it possible to identify cultivars with different characteristics, mainly, in terms of subgroup classifi cation of noble and common garlic, according to the number of cloves per bulb.The representation of phenotypic dispersion of cultivars by multidimensional scaling was slightly more effective than the representation of the cluster analysis with the UPGMA method.

Figure 1 −
Figure 1 − Dendrogram obtained by UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering based on the generalized quadratic distance of Mahalanobis among twenty garlic accessions.

Figure 2 −
Figure 2 − Scatter plot in bidimensional space of disparities among twenty garlic accessions.

Figure 3 −
Figure 3 − Matrix representation of the dissimilarity level among the accessions for original distances, cophenetic distances, and disparities clustered in accordance to k number of clusters.

Figure 4 −
Figure 4 − Average bulb weight (BW) and number of cloves per bulb (NCB) of four clusters of garlic accessions.

Table 1 −
BGH identifi cation, origin, and popular name (or variety name) of twenty garlic accessions registered at BGH/UFV.

Table 2 −
Correlation matrix between original distances, cophenetic distances and disparities.