Genetic divergence on castor bean using the ward-mlm strategy 1 Divergência genética em mamoneira utilizando a estratégia ward-mlm

The objective of this work was to evaluate the genetic diversity in a segregating population F3 of castor bean in the Recôncavo Baiano using the Ward-MLM multivariate technique. The experiment was conducted at the Universidade Federal do Recôncavo da Bahia between April 2009 and March 2010. The 259 genotypes used were derived from crosses between the BRS 149 Nordestina, BRS 188 Paraguaçu, EBDA MPA-17, Sipeal 28 and Mirante 10 varieties. The design was in randomized complete blocks with four replicates in spacing of 3x1 m. The number of days to the appearance of the first female flower, plant height (cm), number of racemes emitted per plant, total length of the raceme (cm), raceme weight (g), fruit weight per plant (g), number of seeds per raceme, number of seeds per plant, weight of seeds per raceme (g), productivity (kg ha-1) and oil content in seeds (%) were evaluated. The formation of four groups occurred as follows: Group I with 84 genotypes, Group II with 142 genotypes, Group III with 15 genotypes and Group IV with 18 genotypes. The Ward-MLM strategy allows for an appropriate clustering between the genotypes, and the variables that contribute most to the divergence are: fruit weight per plant, weight of seed per raceme, raceme weight and productivity.


INTRODUCTION
The castor bean (Ricinus communis L.) is a species largely grown in the Brazilian Northeast.However, the planting in the region has been done with unproductive and non-uniform local varieties (OLIVEIRA; ZANOTTO, 2008).
The identification of variability and knowledge of different genetic constitutions existing in segregating populations, is of fundamental importance for the identification of superior genotypes and the establishment of appropriate strategies to achieve improvement in the castor bean populations.
Studies on genetic diversity provide guidelines to identify these parents with favorable genes for the achievement of segregating generations and, consequently, to obtain genetically improved subjects (COSTA et al., 2006;FIGUEIREDO NETO et al., 2004;GONÇALVES et al., 2009;ROCHA et al., 2009).In this sense, the multivariate methods, have offered effective contributions in identifying genotypes for use in genetic improvement programs of various cultures, including the indication of the most representative traits in order to obtain genetically different populations (BENIN et al., 2003).
In the prediction of genetic divergence, several multivariate methods can be applied.The choice of the most appropriate method has been determined by the accuracy desired by the researcher, the ease of analysis and by the way the data are obtained (BEZERRA NETO et al., 2010;CARGNELUTTI FILHO et al., 2008).The process most commonly used is the clustering method which basically involves two stages.The first stage relates to the estimate of a measure of similarity (or dissimilarity) between the parents and the second with the adoption of a clustering technique for group formation (CRUZ; REGAZZI; CARNEIRO, 1997).
Concerning castor bean, few studies of genetic divergence have been conducted, and the most used existing methods for data clustering were the Tocher method used by Costa et al. (2006), Cavalcante et al. (2008), Bezerra Neto et al. (2010) and the principal components method, as reported by Milani, Dantas e Martins (2009) and Bahia et al. (2008).
The Ward-MLM procedure (1963), also known as the Minimum Variance method, allows the formation of groups by maximizing the homogeneity within groups.The idea underlining the method of Ward is the clustering of R and S groups that minimize the sum of squares within groups, i.e., the sum of square errors (FERREIRA, 2008).
The Modified Location Model (MLM) proposed by Franco et al. (1998), classifies n subjects when p quantitative variables and q qualitative variables are obtained in an environment, assuming that m levels of variable W and the p-multinormal variables for each subpopulation are independent.
Thus, the Ward-MLM strategy consists of two stages, where the first groups are by the grouping method of minimum variance among groups proposed by Ward (1963) using the dissimilarity matrix provided by (GOWER, 1971).In the second stage, the mean of the vectors of the quantitative variable for each subpopulation, independent of the W values, is estimated by MLM procedure (CABRAL et al., 2010;FRANCO et al., 2003;FRANCO;CROSSA, 2002).Therefore, this strategy allows one to define the optimal number of groups and calculate the average of groups with high accuracy, using all available information on the genotypes, whether quantitative or qualitative variables (CROSSA; FRANCO, 2004).
Moreover, this work was aimed at estimating the genetic divergence in a segregating population (F 3 ) based on the Ward-MLM procedure, in order to subsidize the genetic improvement program of the castor bean for Recôncavo Baiano.

MATERIAL AND METHODS
This study was conducted between March 2009 and April 2010 in the experimental area of the Núcleo de Melhoramento Genético e Biotecnologia (Experimental Center of Genetic Improvement and Biotechnology) (NBIO), at the Universidade Federal do Recôncavo da Bahia (UFRB), in Cruz das Almas, BA, at latitude 40º39' S and longitude 39°06'23" W, and 220 m above sea level.According to the Köppen classification, the climate is characterized by a transition zone between areas Am (hot and humid climate with short dry season) and Aw (hot and humid with summer rainfall) and soil of Alic and Cohesive Yellow Latosol Type, clayey in texture and relief plan.The average annual rainfall is 1.170 mm, with average annual temperature of 24.1 °C and relative air humidity of 80% (RIBEIRO et al., 1995).
Genetic divergence on castor bean using the ward-mlm strategy Each plot had 28 rows of 12 m with row spacing of 3 m and spaced 1 m from each other, resulting in 12 holes per row and 336 holes per plot.The borders were formed by lateral lines, with a useful area dimension of 10 m x 78 m.
The area was prepared by mowing, subsoiling, plowing and harrowing.For the correction of soil, 1000 kg ha -1 doses of dolomitic limestone were cast.The holes had dimensions of 20 cm x 20 cm and were opened with the aid of a hoe.A fertilization of foundation was performed in the dosage of 20 kg ha -1 N, 80 kg ha -1 P and 40 kg ha -1 of K.The planting was done through the direct sowing in the field using three seeds per genotype.
After the emergence of the first true leaf growth, the thinning was carried out (leaving only one plant per hole) and 40 kg ha -1 of N were applied as top dressing in furrows that were closed.Weeding and mechanized mowing were in order to eliminate the weeds.The phytosanitary treatments were performed according to crop need.
The number of days to the appearance of the first female flower (FLO) was evaluated (measured in number of days, when the plants were 50% of the female flowers open on the primary raceme from planting), plant height (cm) (PH) (was measured from the base of the plant until the last branch), number of racemes emitted per plant (NRE) (periodic counts were made during the crop cycle), total length of the raceme (cm) (TLR) (measured in the first three racemes of each plant with the aid of a millimeter ruler when it reached full maturity).The raceme weight (g) (WR), fruit weight per plant (g) (FWP), number of seeds per raceme (NSR), number of seeds per plant (NSP), weight of seeds per raceme (g) (WSR) was measured in the first three racemes of each plant and productivity (kg ha -1 ) (PRO) (estimated in accordance with the number of holes in the useful area and size of useful area).The weight traits were evaluated in semi-analytical scale.
The oil content in seeds (%) (SOC) was analyzed in the Laboratório Avançado de Tecnologia Química da Embrapa Algodão (Advanced Chemical Technology Laboratory of Embrapa Cotton), in Campina Grande, PB.The technique of estimation of SOC was made by Low Field NMR (Nuclear Magnetic Resonance) for 1 H, which is a nondestructive method in Oxford MQA 7005 instrument with a 0.47 T electromagnet.Initially the samples were kept for an hour in a controlled environment with temperature at 20 ºC and humidity of 60%.The acquisition of the spectra was done using a probe with an acrylic cylindrical tube, where the seeds were allocated while the results were obtained simultaneously on a computer-interfaced device.
Averages were estimated using the methodology of mixed models REML/BLUP using the (SELEGEN -REML/BLUP) software, according to Resende (2006).The cluster analysis of genotypes was carried out based on these averages by the Ward-MLM method using the SAS version 9.1.3software (SAS, 2003).

RESULTS AND DISCUSSION
All continuous variables were significant at 5% probability by the F-test, demonstrating variability in the genotypes studied for these descriptores.The optimal number of groups was determined to be equal to four, according to the pseudo-F and pseudo-t 2 statistics score and the log-likelihood function, obtained by the Ward-MLM strategy (Figure 1).
The risk profile graph associated with the likelihood ratio test showed that the largest increase in the likelihood function occurred at the group level four, and an increase of 26.94 could be observed.Everitt (1981), quoted by McLachlan and Basford (1988), suggests that this test can be used if the ratio n/p (number of observations to the number of variables) is greater than 5 and n> 5.In any case, the likelihood ratio or the growth of risk is a useful guide for defining the number of groups.The method can therefore define more precise criteria for group formation, resulting in less subjective groups of accessions.The risk profile graph has been used for different values of G and to identify the point of maximum growth as a criterion for defining the number of groups (GONÇALVES et al., 2009).Padilla et al. (2005), when studying 120 populations of Brassica rapa subsp.Rapa L. found that the increase of the maximum of the likelihood function was reached when five groups were considered.However, Ortiz et al. (2008) evaluated corn races from high altitudes in Peru and observed greater increases in the probability function at a level of four and eight groups (increase of 56.22 and 50.60, respectively).According to Gonçalves et al. (2009), the number of groups may vary depending on the species, the number of accessions and the number and type of descriptors.
For most traits, group I was superior to groups II and III, and lower than IV (Table 1).This group was formed by 84 genotypes, of which 50% were derived from crosses between BRS 149 Nordestina cultivar with the other four cultivars.The averages for productivity surpassed that of groups II and III and for the seed oil content trait, the averages were higher than groups III and IV.Crosses between the best genotypes in Group I R. S. Oliveira et al.Group II comprised the largest number of genotypes (Table 1).In this group, 42 genotypes were derived from the crossing of "BRS 149 Nordestina x MPA EBDA 17", 38 from the crossing of "BRS 188 Paraguaçu x EBDA MPA 17"; and 54 from the crossing of "Sipeal x BRS 188 Paraguaçu".It was observed that approximately 65% of the genotypes are descendants of crosses using BRS Genetic divergence on castor bean using the ward-mlm strategy 188 Paraguaçu cultivar and the remaining 35% combined the genotypes derived from crosses between BRS 149 Nordestina with MPA 17 and Sipeal 28 cultivars.This group combined the genotypes with the highest averages for oil content.Cerqueira (2008), in assessing the behavior of these cultivars in Recôncavo Baiano for the SOC trait, noted that cultivars BRS 149 Nordestina and BRS 188 Paraguaçu were the cultivars with the highest average for oil content, with 53% and 48% respectively".
On the other hand, this group (II) showed the longest cycle subjects, which are not interesting for the study area, since the longer these plants remain in the field, the more it can encourage the increase in pests and diseases, especially the gray mold Amphobotrys ricini, which drastically reduces crop yield in Recôncavo Baiano.
Group III presented the lowest averages for productivity, as well as the lowest raceme weight, weigth of seed per raceme, fruit weight per plant, number of seeds per plant and oil content in seeds (Table 1).
This group was composed of 15 genotypes, 14 of these originated from crosses using the EBDA MPA 17 cultivar.It can be inferred that this low productivity is related to the degree of dehiscence of the fruits of this cultivar.Sampaio Filho ( 2009), when evaluating the five cultivars used in the partial diallel that gave rise to the F3 segregating population, noted that the EBDA cultivar MPA 17 achieved the lowest productivity, reaching only 282.83 kg ha -1 .Nonetheless, group III covered the genotypes of lesser stature (Table 1).Despite these genotypes presenting the worst averages for most of the traits they are important for the program, because the short stature trait is favorable, since varieties of small size are desirable in mechanized cultivation.
From the four groups formed by the Ward-MLM strategy, group IV showed the most promising subjects.This group consisted of 18 genotypes (Table 1), and 13 of them were derived from crosses using the Sipeal 28 cultivar.
The productivity averages observed in group IV were higher compared to other groups.Bahia et al. (2008), on assessing the parents that gave rise to this population (F 3 ), obtained high productivity values for the Sipeal 28 cultivar, with an average of 1347 kg ha -1 .This group also combined the genotypes with the highest averages for the number of racemes emitted per plant, weight of raceme, number of seeds per raceme, weigth of seed per raceme, fruit weight per plant, number of seeds per plant and oil content in seeds roughly equal to other groups (Table 1).Furthermore, it had the lowest average for flowering.Thus, we can infer that this group has the genotype most likely to lead to further strains with earliness and high productivity.
Regarding the traits evaluated, those that contributed most to genetic diversity based on the first canonical variable was the weight of fruit per plant, weight of raceme per plant, yield and number of seeds per plant.Different results were found by Costa et al. (2006), who when studying the castor bean accessions and cultivars found that the onset of flowering, plant height, oil content in seeds and effective length of the primary raceme were traits with the highest contribution to the divergence, while the yield potential was the variable that contributed least to the genetic divergence.
The dissimilarity of the groups based on the Mahalanobis distance by the Ward-MLM strategy, demonstrated a range of variation in the 48.16 value, while groups II and III were the closest, while groups II and IV were the most distant (Table 2).The crosses between groups II and IV may be performed in order to exploit heterosis in getting transgressive subjects for production traits.
The first two canonical variables obtained by the Ward-MLM methodology explained 95% of the total variation (Figure 2), and enabled to visualize the genetic variability in the segregating population by graphic dispersion analysis.Barbé (2009), using the same procedure on bean pods obtained 96.0% of the total variation in lineages and said that this high value indicates that the graphic representation of the first two canonical variables is adequate to verify the relationship between groups and subjects within a group.Costa et al. (2006), obtained similar results for castor bean.The selection in segregating generations should take into account the genetic diversity, per se performance of parents and complementary allele.Thus, it appears that the F3 segregating population of castor bean shows promising genotypes for the selection of traits of interest for the cultivation in Recôncavo Baiano.

CONCLUSIONS
1. Estimates of genetic divergence based on the Ward-MLM procedure allowed to identify variability in the F 3 segregating population of castor bean, as well as enabling coherent groups to be formed; 2. The most promising genotypes for selection in the program belong to group IV, due to its good performance for all the traits, especially for productivity, earliness and the oil content in seeds; 3. The characters that contributed most to genetic diversity is the fruit weight per plant, weight of raceme per plant, productivity, weight of raceme and number of seeds per plant.

Figure 1 -
Figure 1 -Graph of the logarithmic function of probability (log-likelihood) showing the optimum number of groups for the F 3 segregating population of castor beans, Cruz das Almas, 2010

FLO = number of
days to the appearance of the first female flower, PH = plant height; NRE = number of racemes emitted per plant; TLR = total length of the raceme; RW = raceme weight; FWP = fruit weight per plant; NSR = number of seeds per raceme; NSP = number of seeds per plant; WSR = weight of seeds per plant; PRO = productivity; SOC = oil content in seeds; N = number of genotypes per group, (-) = does not contain information and IV may be convenient, allowing high heterosis for the productivity trait and ultimately gain from the selection.

Figure 2 -
Figure 2 -Graphic dispersion of the first two canonical variables (CAN1 and CAN2) representing the formation of four groups by the Ward-MLM strategy in an F 3 segregating population of castor beans, Cruz das Almas, 2010

Table 1 -
Average of 11 variables in each group and contribution of traits to the canonical variables (CAN1) and (CAN2) studied in an F3 segregating population of castor beans

Table 2 -
Separation of clusters by the Ward-MLM strategy in an F3 segregating population of castor beans, based on the Mahalanobis distance R. S. Oliveira et al.