Genetic diversity between improved banana diploids using canonical variables and the Ward‐MLM method

The objective of this work was to estimate the genetic diversity of improved banana diploids using data from quantitative analysis and from simple sequence repeats (SSR) marker, simultaneously. The experiment was carried out with 33 diploids, in an augmented block design with 30 regular treatments and three common ones. Eighteen agronomic characteristics and 20 SSR primers were used. The agronomic characteristics and the SSR were analyzed simultaneously by the Ward-MLM, cluster, and IML procedures. The Ward clustering method considered the combined matrix obtained by the Gower algorithm. The Ward‐MLM procedure identified three ideal groups (G1, G2, and G3) based on pseudo-F and pseudo-t2 statistics. The dendrogram showed relative similarity between the G1 genotypes, justified by genealogy. In G2, 'Calcutta 4' appears in 62% of the genealogies. Similar behavior was observed in G3, in which the 028003‐01 diploid is the male parent of the 086079‐10 and 042079‐06 genotypes. The method with canonical variables had greater discriminatory power than Ward‐MLM. Although reduced, the genetic variability available is sufficient to be used in the development of new hybrids.


Introduction
Bananas are derived from the wild species Musa acuminata Colla (AA) and Musa balbisiana Colla (BB).The fruit is produced by smallholders, and millions of people in developing countries use it as staple food.According to the Food and Agriculture Organization of the United Nations (2012), the total area harvested in 2009 was approximately five million hectares, which produced 95 million tons.
Banana genetic breeding programs use diploids as the basis for developing new commercial hybrids, since triploids present sterility as a major constraint (Amorim et al., 2011).In addition, diploids have traits of agronomic importance, such as disease resistance and high fruit yield (Lessa et al., 2010).The agronomic Pesq.agropec.bras., Brasília, v.47, n.10, p.1480Brasília, v.47, n.10, p. -1488Brasília, v.47, n.10, p. , out. 2012 characterization of these diploids -including the estimation of genetic variability using molecular markers -is important when it comes to choosing the progenitors for the crosses between divergent genotypes, aiming to explore heterosis and to develop new, improved diploids, which can be used in crosses with triploid and tetraploid genotypes in order to develop new commercial banana hybrids (Amorim et al., 2008).
Simple sequence repeats (SSR) or microsatellite markers have been widely used for characterizing species, especially due to their co-dominant nature, repeatability, and easy data interpretation (Creste et al., 2003).Genetic diversity is usually estimated considering, separately, quantitative data -such as plant height and pseudostem girth -and qualitative data, including anthocyanin content, leaf position, pulp color, and molecular marker data (Cabral et al., 2010).Moreover, strategies to rank genotypes considering combined data have also been proposed using clustering methods, such as the Ward method and the unweighted pair group method with arithmetic mean (UPGMA) (Gonçalves et al., 2009).
The modified location model (MLM) procedure, proposed by Franco et al. (1998), is another interesting strategy for quantifying genetic variability using quantitative and qualitative variables simultaneously.The MLM has two stages.In the first one, the Ward clustering method is used to define the groups using the Gower dissimilarity matrix.In the second one, the average of the quantitative variable vector is estimated by the MLM, regardless of the value of the qualitative variables.This procedure has been used for a variety of crops, such as common beans (Cabral et al., 2010), tomatoes (Gonçalves et al., 2009), andbananas (Pestanana et al., 2011).
The objective of this work was to estimate the genetic diversity of improved banana diploids using quantitative and simple sequence repeats (SSR) marker data simultaneously.

Materials and Methods
Thirty-two improved diploids, developed by the banana genetic breeding program at Embrapa Mandioca e Fruticultura, and the SH3263 diploid, developed by the Fundación Hondureña de Investigación Agrícola, were used (Table 1).
The experiment was carried out in Cruz das Almas, BA,Brazil (12º40'19"S and 39º06'22'W',220 m above sea level).The climate of the region is humid tropical, Aw to Am, according to Köppen, with an annual average temperature of 24.5ºC, relative humidity of 80%, and average annual precipitation of 1,249.7 mm (Agritempo, 2008).Federer's augmented block experimental design was used (Federer, 1956) with 31 regular treatmentsdiploids 1 to 28, 32 and 33 repeated once in each block with replicates only in the plots -and three common treatments -diploids 29 to 31, considered as controls, repeated in the five blocks.Each plot consisted of six plants with spacing of 2.5x2.5 m.
Evaluations were made in the first production cycle and considered the following 18 agronomic traits: plant height (PH, cm); pseudostem girth (PSG, cm); number of tillers during flowering (NTF); number of leaves during flowering (NLF); number of days from bunch emission to harvest (NBH); presence of pollen grains (PPG), based on a scale in which 1 represents the absence of pollen grains, 2 a small amount of pollen grains, 3 the average amount of pollen grains, and 4 the abundance of pollen grains; number of hands per bunch (NHB); number of fruits per hand (NFH); yellow Sigatoka at emergence of flowering (SEF) and at harvest (SH), using the scale proposed by Stover (1972); number of leaves at harvest (NLH); weight of second hand (WSH, kg); length and fruit diameter (LF and FD, respectively, cm); length and diameter of pedicel (LP and DP, respectively, mm); presence of seeds (PS), evaluated with the following scale: 1 is the absence of seeds, 2 represents 1 to 10 seeds, 3 represents 11 to 20 seeds, and 4 is more than 21 seeds; and length and diameter of stalk (LS and DS, respectively, cm).
Twenty pairs of SSR primers were used for molecular characterization, with five microsatellite markers from the Ma series (Crouch et al., 1998), 12 from the AGMI series developed by Lagoda et al. (1998), and three from the MaOCEN series (Creste et al., 2006) (Table 2).DNA was extracted from young leaves using the CTAB method (Doyle & Doyle, 1990).Amplification reactions were done in a final volume of 13 µL -1 , containing the following reagents: KCl 50 mmol L -1 , Tris-HCl 10 mmol L -1 (pH 8.3), MgCl 2 2.5 mmol L -1 , 100 µmol L -1 of each of the dNTPs (dATP, dTTP, dGTP, and dCTP), 0,2 µmol L -1 of each primer, 50 ng of genomic DNA, and one unit of Taq DNA polymerase.Amplifications were performed in a Perkin Elmer 9700 thermal cycler (Perkin Elmer do Brasil Ltda., São Paulo, SP, Brazil) with the following program: an initial cycle of 3 min at 94ºC, followed by 40 s at 94ºC, specific annealing temperature for each primer during 40 s, 1 min at 72ºC, followed by 30 cycles of 40 s at 94ºC, 40 s at 45ºC and 60 s at 72ºC, and final temperature of 8ºC.The fragments were separated in 3% Ultrapure Agarose-1000 gels (Invitrogen, Carlsbad, CA, USA) under standard conditions, stained with ethidium bromide, visualized under ultraviolet light, and photo-documented using the UVITEC equipment.
The intrablock procedure was adopted for agronomic data analysis (Duarte, 2000).Intrablock analysis was (1) The first three numbers refer to the female parent, the following three to the male parent, and the last two to the number of the selection.
The analysis was carried out with the SAS software package using the procedure for general linear models (Proc GLM).The average of the treatments was adjusted by the minimum squares using the lsmeans SAS module.Standard computational procedures were carried out as proposed by Duarte (2000).
The quantitative (agronomic) and qualitative (molecular marker) data were analyzed simultaneously using the Ward-MLM procedure (Franco et al., 1998) and the cluster and interactive matrix programming (IML) command of the SAS program for cluster formation.The Ward cluster method considered the combined data matrix obtained by the Gower algorithm.
The correlation between the agronomic variables and the canonical variables was obtained graphically using the Candisc command in the SAS software.
In order to define the ideal number of groups, the procedure indicated for the MLM model, based on pseudo-F and pseudo-t 2 statistics, was used.Taking into account the definition of the optimal number of groups, a hierarchical classification was obtained by the Ward method, which makes available the initial value needed to program the final step of the MLM (Franco et al., 1998).

Results and Discussion
The SH3263 (Cod.32) and 013004-04 (Cod.33) diploids were not evaluated due to their bad development in the field, mainly caused by the attack of the banana weevil (Cosmopolites sordidus).
Significant differences was detected between diploids and controls for plant height, days from emission to harvest, and number of fruits, which are key characteristics for the selection of improved diploids (Table 3 and 4).Genotypes significantly affected the agronomic traits, except for fruit diameter.This result indicates the existence of genetic variability between the diploids and justifies the use of molecular markers and multivariate methodologies to quantify it.
The correlation between the matrix with all the 133 alleles and the matrix with 85 alleles was 0.94, with sum of the deviation squares (SDq) of 0.52 and stress value (S) of 0.048.According to Kruskal (1964), an S value ≤0.05 indicates excellent precision.
In the literature, the number of SSR markers and alleles used to characterize banana genotypes is inferior to the ones used in the present study.Ning et al. (2007) considered the use of ten SSR markers sufficient to genotype 50 banana accessions from different origins, finding 92 alleles.Creste et al. (2004) used nine SSR markers to genotype 49 banana diploids, using 115 alleles, while Creste et al. (2003) genotyped 35 banana cultivars using 11 SSR markers, with 67 alleles.Mattos et al. (2010) identified 94 alleles using 13 SSRs to genotype 26 banana accessions.
The estimate of genetic variability between genotypes is a critical point for cluster and genetic diversity analyses between and within populations (Kosman & Leonard, 2005).The Ward-MLM procedure, initially proposed by Franco et al. (1998), has been used in the combined analysis of multicategorical, quantitative, and molecular data.Therefore, the estimate of the correlation between the dissimilarity matrices obtained by the Gower distance (quantitative data) and the binary data (SSR) was carried out using the simple coincidence index.The correlation between the two matrices was r=-0.15.For Souza & Sorrells (1991), the low association between morphological and molecular data can be attributed to the partial and insufficient representation of the genome when morphological data is used.This low correlation can be explained by the absence of association between the loci that control the morphological characters studied and the alleles identified by the SSR markers.The average genetic dissimilarity between the diploids was 0.49, varying from 0.33, between the 042052-04 and 042052-03 diploids, to 0.59, between the 017041-01 and 091087-01 genotypes.
The dendrogram based on the combined analysis by the Ward-MLM procedure is shown in Figure 2. The cophenetic correlation value was high (r= 0.71, p<0.0001, 10.000 permutations) and adequate, since values of r ≥0.56 are considered ideal, reflecting good agreement with the genetic similarity values (Vaz Patto et al., 2004).
The Ward-MLM procedure indicated that the ideal number of groups was three, based on pseudo-F and pseudo-t 2 statistics, considering the Gower algorithm.Group 1, with seven diploids; group 2, with 21; and group 3, with three.
The dendrogram shows relative similarity between the group 1 genotypes, which is justified by genealogy, since the 001016-01 (Borneo x Guyod) genotype is the female parent of the 091087-01 and 091079-03 hybrids.The same occurred with the 013019-01 and 01318-01 diploids, which have the wild Malaccensis diploid as one of the ancestors.
Group 2 was formed by 21 diploids, which was an already expected result since the 'Calcutta 4' diploid is present in 62% of these genealogies as the male or female parent (Table 1 and Figure 2).This diploid is widely used in many banana breeding programs as a source of alleles for resistance to black Sigatoka, caused by Mycosphaerella fijiensis Morelet (Amorim et al., 2008).Another diploid, the M53, had a appearing frequency greater than 33% in the genealogies.Therefore, the group formed may be associated with the small number of genitors involved in obtaining these hybrids.Similar behavior was observed in group 3, since the 028003-01 diploid is the male parent of the 086079-10 and 042079-06 genotypes.Amorim et al. (2008) used SSR markers to quantify the genetic variability between cultivated, improved, and wild banana diploids.Cluster analysis based on the SSR polymorphism was not able to completely separate the improved, cultivated, and wild hybrids.Some diploids were grouped based on their geographic origin, such as Musa ornata Roxb.and IAC-1, and 'Tjau Lagada' and 'Lidi', whereas for others, no relationship was established.There was a tendency for clustering the improved diploids based on their genealogy.
The first two canonical variables explained 81.59% of the variability between the six groups formed (Figure 3).This value indicates that the graphic representation of the first two canonical variables was appropriate for visualizing the genetic relationship between the groups and between the accessions within the same group.
The length of stalk had the greatest correlation with the first canonical variable, followed by the number of tillers during flowering and the number of fruits in the second hand, with values of 0.70, 0.51, and -0,27, respectively.However, for the second canonical variable, the estimates of the highest correlations were found for the characteristics number of hands per bunch (r=0.56) and number of fruits per bunch (r=0.46).When comparing the groups formed by the Ward-MLM method (Figure 2) and the ones obtained by the canonical variables (Figure 3), the latter had a higher discriminatory power, since six groups were Figure 2. Dendrogram constructed by the Ward MLM method using the genetic distances from 18 morphoagronomic characters from 31 improved banana diploids in the second production cycle.The genotype code is as shown in Table 1.
formed in comparison to only three by the Ward-MLM method.The criteria used for separating the groups, considering the canonical variables, were associated to the genealogy of the diploids.
Currently, the banana genetic breeding program at Embrapa Mandioca e Fruticultura has 43 improved diploids with genetic resistance to most pests and diseases, including yellow and black Sigatoka and Fusarium wilt, which also present desirable agronomic characteristics, such as short stature, high yield, and drought tolerance.

Conclusions
1.There is genetic variability between the 31 banana diploids developed by Embrapa Mandioca e Fruticultura, enabling crosses for the development of new cultivars.
2. Microsatellite markers are efficient in quantifying genetic variability.
3. Canonical variable analysis provides better clustering of the banana genotypes than the Ward-MLM method.1.

Figure 1 .
Figure 1.Resampling analysis for a precise estimation of the genetic variability between the 31 improved banana diploids developed by Embrapa Mandioca e Fruticultura in the first production cycle.

Figure 3 .
Figure 3. Dispersion of the first two canonical variables (CAN) with the formation of six groups (G1-G6) by the Ward-MLM strategy, considering 31 improved diploids in the first production cycle.Codes of the genotypes are as shown in Table1.

Table 1 .
Banana diploid hybrids used in the study.

Table 2 .
Microsatellite primers used in the study.

Table 3 .
Analysis of variance for the agronomic characteristics(1)of the 31 improved banana diploids in the first production cycle.

Table 4 .
Analysis of variance for the agronomic characteristics(1)of the 31 improved banana diploids in the first production cycle.