Multivariate diallel analysis by factor analysis for establish mega-traits

: In plant breeding, the dialelic models univariate have aided the selection of parents for hybridization. Multivariate analyses allow combining and associating the multiple pieces of information of the genetic relationships between traits. Therefore, multivariate analyses might refi ne the discrimination and selection of the parents with greater potential to meet the goals of a plant breeding program. Here, we propose a method of multivariate analysis used for stablishing mega-traits (MTs) in diallel trials. The proposed model is applied in the evaluation of a multi-environment complete diallel trial with 90 F 1 ’s of simple maize hybrids. From a set of 14 traits, we demonstrated how establishing and interpreting MTs with agronomic implication. The diallel analyzes based on mega-traits present an important evolution in statistical procedures since the selection is based on several traits. We believe that the proposed method fi lls an important gap of plant breeding. In our example, three MTs were established. The fi rst, formed by plant stature-related traits, the second by tassel size-related traits, and the third by grain yield-related traits. Individual and joint diallel analysis using the established MTs allowed identifying the best hybrid combinations for achieving F1’s with lower plant stature, size, and grain yield.


INTRODUCTION
Experimental mating designs, especially diallel crosses, are widely used in maize breeding for the selection of superior hybrid combinations. These types of trials allow one to estimate the general combining ability (GCA), related to the additive gene effects and the specifi c combining ability (SCA), related to non-additive gene actions (Feher et al. 2014, Oliveira et al. 2016. With the estimates of the genetic parameters the breeder can defi ne which are the combinations of crosses that reveal greater heterosis in F 1 for the desired trait. The genotype × environment interaction is characterized by the phenotypic performance of the hybrids not being consistent in the multi-environments. Responses can be modified by changes that occur intrinsically in the environment, that is, refl ect differences in the sensitivity of hybrids to changes in the environment (Ramalho et al. 2012, Allard 1999. The genetic causes of genotype × environment interaction can be attributed to physiological, biochemical, adaptive and related to scale representation of traits (Cruz et al. 2012).
Multi-environment diallel trials aim at obtaining hybrids for a broad region, or for specific regions that enable the hybrid to express its maximum agronomic potential (Hallauer et al. 2010, Ramalho et al. 2012. Thus, in these types of trials it is possible to detect the interaction of combining ability×environment (Nardino et al. 2016b, Ogut et al. 2014, Zhang et al. 2016, Mutimaamba et al. 2017. Therefore, it is assumed that gene effects are heterogeneous among the environments under study. In this case, when the interaction is significant, the best way to achieve satisfactory results is regionalizing the recommendation of hybrid cultivars. The selection of cross combinations between lines is performed by the breeders after the evaluation of a set of traits, which are determinants for the definition of their superiority (Silva et al. 2008). However, in the majority of reports, each trait is analysed and interpreted separately, which are often insufficient to precisely predict the phenomenon. Multivariate analyses allow combining and associating the multiple pieces of information of the genetic relationships between traits. Therefore, multivariate analyses might refine the discrimination and selection of the parents with greater potential to meet the goals of a plant breeding program (Ledo et al. 2003).
Although multivariate diallel analysis has been used in some studies, the selection of traits aiming at stablishing mega-traits (MT) occurs empirically and often a priori (Ledo et al. 2003, Kostetzer et al. 2009, Nascimento et al. 2010, Hu et al. 2017. The combined use of diallel cross designs and a posteriori multivariate analysis considering the correlations among traits to stabllishing the MTs, can provide important information for breeders in selecting crosses. In an innovative and unpreceded way in plant breeding, this study has aimed proposes and validates a linear model for multi-environments trial using multivariate analysis for stablishing mega-traits (MTs) and selection of lines in diallel crosses.

Plant Material
We carried out complete diallel crosses (F 1 's and reciprocals) with ten maize inbred lines (S 9 generation, KSP 1 to KSP 10) of the Maize Breeding Program of the KSP Seeds and research Ltda, Pato Branco, PR, Brazil. The lines used at the crossing were selected from single and triple commercial hybrids with desirable agronomic traits, mainly for grain yield. The crosses were carried out in Campo Alto, Clevelândia, PR, Brazil, between October, 2011 andMarch, 2012. The lines were previously selected considering their performance per se, tolerance to the main leaf diseases, plant height, height of ear, resistance to lodging and culm breakage, yield of grain, mass of one hundred grains. At the flowering stage, the crosses were carried out according to the established genetic design by using artificial pollinations. The ears were manually harvested in field (with approximately 35% moisture), and later arranged in an air-forced dryer up to moisture stabilization (14%). Afterwards, the ears were threshed and the seeds of each crossing were splitted into three equal parts, aiming at conducting the F 1 's in multi-environments.

Sites and experimental design
From the crosses, 90 hybrid combinations were obtained, which were evaluated in the 2012/2013 growing season in three growth sites in the Southern region of Brazil: Pato Branco (PB), Campos Novos (CN) and Frederico Westphalen (FW) (Figure 1). For all sites, the experimental design was a randomized complete block design with three replicates. The experimental units were composed of two 5-m long rows spaced at 0.70 m. In the surroundings of the experiment four rows were sown to minimize the edge effects. Sowing was manually carried out using 300 kg ha -1 of a 5-20-20 NPK-based fertilizer. When the plants had six to eight fully emerged leaves, 140 kg ha -1 of N was applied in a single topdressing, according to Coelho et al. (2002).
When plants reached emergency stage we carried out a manual thinning adjusting the plant density to 42 plants per plot. This number corresponds to a density of 60,000 plants per hectare. Weed control was performed using preemergence herbicide (Atrazine + Simazine) and post-emergence herbicide (Tembotrione). At the harvest stage, aiming at avoiding edge effects, 0.5 m at each plot end was excluded.

Assessed traits
At flowering, harvest and post harvest stages 14 agronomic traits were evaluated in three plants per plot. Table I shows the traits and its assessment method.

Analise of variance
As a first step, data on 14 assessed traits (Table  I) were subjected to univariate ANOVA to verify the assumptions (normality, homogeneity, independence of residuals and additivity of the model). The following model was considered Where, ijk y is the espected value of the dependent trait in the jth block (j = 1, 2, and 3) of the kth environment (k = 1, 2, and 3) which received the ith genotype (i = 1, 2, ..., 90); µ is the overall mean; ( ) j k β is the effect of the block j in the environment k; i α is the additive effect of the ith genotype (fixed); k τ is the additive effect of the kth environment (random); ( ) ik ατ is the nonadditive interaction between the ith genotype in the kth environment (random); and ( ) ij k ε is the average error associated to ijk y assumed to be normally, identically and independently distributed [IID~2 N(0, ) σ ].

Factor analysis
As the interaction was significant for all traits, the factor analysis was performed considering the individual environments. The following model was considered: Where X j is the jth trait estimated in each plot (j= 1, 2, ... 14), l jk is the fator load of the jth trait linked to kth factor (k = 1, 2, ... m); F k is the kth common factor, j ε is the specific factor (Cruz & Carneiro 2003). The initial fator load is: σ is the variance of the specific factor linked to the jth trait and with the initial factor load of these factors. After the initial factor load was calculated, the Varimax rotation procedure was applied (Mardia et al. 1979) aiming at obtaining the final factor loads, with which mega-traits were chosen. For the factor analysis, we used the original data matrix of the crosses (90 hybrids, three environments, three replicates and 14 traits). A factor analysis was carried out for each environment. The number of common factors was defined as being equal to the number of eigenvalues greater than one and the orthogonal model was chosen (Ferreira et al. 2010).
The scores of the factors were estimated by using the following matricial equation: is the vector of 1×m dimension of the factor scores (E k ); is the vector of v×1 dimension of the traits of the kth cross.
Data on the 14 traits were standardized and subjected to factor analysis, according to the model described above. Three mega-traits were established, called here as plant stature (MT-1), tassel size (MT-2) and grain yield (MT-3).

Analysis of diallel crosses
The three MTs were subjected to diallel analysis, to verify the significance of the interaction and to obtain the estimates of the combinatorial abilities, according to the model 3, method I of Griffing (1956). The following estimates were obtained: general combining ability (GCA), specific combining ability (SCA) and reciprocal specific combining ability (RSCA).
Firstly, an individual diallel analysis was carried out according to the followig model: Where ij y : is the average value of the F 1 ' s and reciprocal hybrids (i, j = 1, 2, ..., p); µ is the overall mean; i j g and g are the effects of the GCA of the ith and jth parent, respectively; ij s is the effect of the SCA for the crosses between parents of i and j order; ij r is the reciprocal effect wich reveals the differences of parents i and j, when used as male or female lines (in the cross ij); and ij : ε is the average error assumed to be 2 IID ~ N(0, ) σ . The following resctrictions were considered: î g 0 Σ =, iĵ s 0 Σ =, and ij ji s = s . Subsequently, a joint diallel analysis was performed considering the three sites. All the effects were assumed to be fixed, except the experimental error. The same restrictions of individual diallel analysis were considered. The statistical model adopted for each MT was the following: Where, ij Y is the value of hybrid combination between the parents i and j; µ is the overall mean; i j g and g are the effects of the GCA of the ith and jth parent, respectively; s ij is the effect of the SCA for the crosses between parents of i and j order; a k is the effect of the environment k; ga ik and ga jk is the effect of the interaction between GCA associated to the parents i and j and the environment k, respectively; sa ijk is the effect of the interaction between SCA associated to the parents i and j and the environvent k; and (k )ij ε is the average error assumed assumed to be 2 IID ~ N(0, ) σ . Estimates of the quadratic components that express the genetic variability of the genotypes studied in terms of general and specific combining ability and reciprocal effect were obtained according to the following expressions, assuming the components as fixed.
where ĝ φ is the quadratic component associated to the general combining ability; ˆs φ is the quadratic component associated to the specific combining ability; ˆr φ is the quadratic component associated to the reciprocal effect; MSG, MSS, MSR and MSE are the mean squares of the general combining ability, specific combining ability, reciprocal effect and error, respectively; p is the number of lines used in the diallel analysis (Cruz et al. 2012(Cruz et al. , 2013. The genetic-statistical analyses were carried out with the softwares Genes (Cruz 2013) and SAS 9.22 (SAS Institute Inc 2010).

Factor analysis
It was initially intended to obtain the grouping of all the traits within the factors, but the trait aggressiveness of the root system was not grouped in any of the three established factors. Then the script of Johnson & Wichern (2002) was followed. The analyses were carried out with four and five factors. In all, a factor related to the aggressiveness of the root system was obtained, but a factor that demonstrated simultaneous relations of this trait with the other evaluated traits was not obtained, which reveals the absence of correlation of this variable with the other studied. Thus, a minimum number of factors (in this case three factors) was maintained, which according to Granate et al. (2008), made the interpretations more concise. We also chose to maintain three factors, because the first three eigenvalues explain approximately 70% of the total variation.
The initial factorial loads, initial commonalities and final factorial loads were calculated (Table SI -Supplementary Material). By analyzing the final factorial loads, the factors for FW were identified, as the first related to plant stature (MT-1), since it was the one that presented greater final factorial loads for the traits related to plant stature, which in this work are EH, PH and EH/ PH (Table SI). The second factor was the one that was mostly related to the grain yield (MT-2), because it presented high factorial loads for the traits FKM, GYP and GYH (Table SI). The third factor presented the highest final factorial load for the traits FLD, TL and NTB (Table SI), called the mega-trait Tassel Size (MT-3). For the PB and CN environments in the first factor, the mega-trait Grain Yield was grouped; in the second factor Tassel size; and in the third factor the plant stature. In this way, the MTs were stablished by the magnitudes and the signals of the final factorial loads. The signs revealed by the factorial loads reflect the direction of the selection of the trait, considering the aims of the breeding programs. The MT can vary according to the biological interpretation, i.e., breeders can establish a MT based on the aims of their own breeding program (Ferreira et al. 2010).
The canonical loads were used as weighting coefficients of the standardized traits, to obtain the scores of the new MTs, obtained from the factor analysis. The analysis of factors has in theory, that the traits of a given factor are weakly correlated with the traits of other factors, to the point that the factors are uncorrelated (Cruz & Carneiro 2003).
The sites conducting the trials present particularities for the climatic elements that have a strong influence on the growth and development of maize. Initially, the three sites were chosen because they represent a representation of the edaphoclimatic conditions for the southwest of Paraná, midwest of Santa Catarina and northwest of Rio Grande do Sul, with corn cultivation in small and medium rural properties. In these environments a variation of the climatic elements occurs, such as the altitude of CN for FW and PB, which gives T° colder the ideal night for the cultivation of maize hybrids, as it is perceived in Figure 1, which shows that the T° average are lower. The identification of promising crosses for the three environments simultaneously is desired, as well as for each environment individually.

Analysis of variance of the mega-traits
The sum of squares of multivariate diallel analysis via factor analysis (Table II) revealed that there were significant effects for the crossing (C) in MT-1. The same significance was observed for GCA and SCA. The presence of significance for GCA and SCA points to the existence of variability between GCA, associated with additive gene effects, and between SCA, associated with non-additive effects. The GCA with significant effects indicates that the inbred line contribution was different according to the crosses to which they were involved. However, the variability between the effects of SCA indicate that there are hybrid combinations that presented different performance than it was expected only based on the GCA effects (Aguiar et al. 2004).
Given the great importance of SCA estimates for selecting the best hybrid combinations, the selection of which line(s) will be male or female in the crosses is not specified. In this sense, reciprocal effect information is needed (Cruz et al. 2012). In this study, no mega-trait presented significant reciprocal effects (Table  II), the magnitude of the traits is not influenced by the direction of the crosses. In this sense, it is suggested that the inheritance of the traits associated to these MTs is mainly controlled by nuclear genes (Vivas et al. 2013).
In the multivariate diallel analysis considering the effects of the interaction (Table II) a significance (p < 0.01) was observed for the three MTs for interaction between crosses×environment (C×E), general combining ability×environment (GCA×E), specific combining ability (SCA×E) and for the reciprocal effects×environmental (R×E). The nonsignificant effect for GCA and SCA of the MT-2 and MT-3 with the environment indicate that the selection of the lines aiming at hybrid formation is specific for each environment. Thus, the heterotic groups used for one cannot be generalized to the other environments, due to the significant effect revealed by the interaction.
The presence of significant C×E interaction shows that the hybrids present a differential response depending on modifications of the environment. For maize, GxE interaction has been widely studied due to its high breeding level, narrow genetic base, and its model-crop role among outcrossing species. Furthermore, the wide range of environments in which this crop can grow is also a factor (Souza Neto et al. 2015). This type of interaction was also reported in previous studies (Locatelli et al. 2002, Oliboni et al. 2013. The presence of significant effects of SCA on the MT-1 mega-traits indicates that the parents'  (Table II)  The partitioning of the sum of squares of the GCA×E interaction is shown in Table III for the three MTs. The results, as well as the discussion, will be presented for each MT separately.

General combining ability for the mega-trait plant stature
The morphological traits of plant stature grouped into MT-1 have been taken special attention of breeders in recent years. Currently, maize breeders have directed their efforts aiming at reducing plant stature (Aguiar et al. 2004). Thus, negative GCA estimates are of greater interest since the additive gene contributions of the lines with such estimates are favorable for stature reduction. The lines 4,5,7,8 in FW,4,5,6,7 in PB,and 4,5,7,8,9,10 in CN showed estimates of additive gene effecs favorable for reduction of plant stature (Table III). Therefore, the lines 4, 5 and 7 present negative estimates for the three studied environments. Negative estimates of GCA obtained for the three traits (that compose the MT-1) simultaneously, present high importance in maize breeding programs. Taller plants make harvesting more difficult, and are more susceptible to lodging and breaking, in regions with a high incidence of winds (Freitas et al. 2013).

General combining ability for the mega-trait tassel size
Regarding the traits related to tassel size, grouped into MT-2, maize breeding programs have worked aiming at reducing their magnitudes. Thus, lines with negative estimates are desirable. The lines 2, 3, 6, 7, 10 for FW, 2, 5, 6, 9 10 for PB and 5, 6, 7, 9, 10 for CN presented favorable estimates of additive gene effects for reduction of tassel size (Table III). The lines 6 and 10 showed negative estimates in the three environments. These estimates are desirable in maize breeding programs, since lines that contribute for reduction of tassel size are desired.
This new approach was an important modification on aims of maize breeding programs in the corn belt (Duvick & Cassmann 1999). These same authors pointed out that tassels with smaller size have lower apical dominance on the ears, a very relevant feature under stress conditions. It is important to mention that a lower cost of photoassimilates also occurs in the development of the tassel, which refers to a greater adaptation of the crop to higher plant densities. Both negative phenotypic and genetic associations between tassel-related traits and grain yield have been described by correlation and path analysis studies in maize (Nardino et al. 2016a, b). The same authors, analyzing a partial diallel cross, showed that there were lines with negative (favorable) effects for reduction of tassel size. Sangoi et al. (2006) reported that the tassel can suppress the development of the ear by three different ways: by shading the upper leaves, by competing for photoassimilates and by producing and exporting growth regulators that would be used in the development of the ear.

General combining ability for the mega-trait grain yield
The selection of lines with positive estimates of GCA for the traits related to grain yield grouped in MT-3, point to the presence of genes with favorable additive effects to increase yield and their respective components. Thus, line 5 showed positive and elevated estimates in FW. In PB and CN, lines 2 and 6 revealed higher estimates of GCA and may be considered superior to the average of the lines involved in the diallel. Thus, these lines might be used to provide an increase of yield components and consequently increase grain yield. The oscillation in the GCA estimates is linked to the presence of significant interactions for this MT.
This was expected, considering the quantitative inheritance of the genes that control the components and the grain yield. Optimal conditions could be achieved if it were possible to identify a population where two parents have the highest estimates both for GCA and SCA. This would be very important because the population would have a high average -since the GCA of the parents is associated with the high frequency of favorable alleles-and the two lines Table III. Estimates of general combining ability (GCA) for three mega-traits in a joint diallel analysis with F 1 's and reciprocal grown in three sites. Bold-highlighted values are the favorable combining for each mega-trait. would have good complementarity -provided by the high SCA. Thus, the population would have a large number of loci in heterozygosity and consequently greater potential genetic variability (Ramalho et al. 2012).
Estimates of SCA were significant for only MT-1, but interactions of SCA×E were significant for all MTs (Table II). Estimates of reciprocal effects (RSCA) and SCA for 90 hybrids related to the three MTs are shown in Figures 2, 3, 4, 5, 6 and 7.

Specific combining ability for mega-trait plant stature
The significance of SCA is not sufficient to recommend a cross since the selection of hybrid combinations should involve lines with high estimates of SCA, where at least one of the parents has high GCA (Benin et al. 2009). Thus, GCA-related additive alleles may provide greater accuracy in the selection of crosses.
On the other hand, the simultaneous selection of crosses in the three environments would be feasible, since the hybrid combinations 3×9, 2×6, 1×3 and 4×8 were common to all studied environments, which revealed estimates of negative SCA, favorable for the reduction of mega-trait plant stature. However, these combinations with lower PH and EH can be important source of genes/alleles favorables for selection of populations/strains to reduce plant height. Taller plants with high inserted ears can cause increased susceptibility to lodging and may sometimes not be suitable for cultivation in areas with high-wind events and to farmers working with high nitrogen doses (Paixão et al. 2008, Baretta et al. 2016.
Simultaneous selection for the three environments aiming at reducing the tassel size would not be efficient because there were no common combinations across the sites. It is noticed that the combinations 1×5, 6×8, 4×8, 4×7 and 5×10 are favorable in at least two environments. Souza et al. (2015) studying variance components and canonical correlations with simple maize hybrids, reported the presence of genetic variation, making possible the selection of hibrids with smaller tassel size. The same authors also proved the negative and significant effects on the canonical pairs of the tassels on the grain yield of the simple hybrids studied.
Experimental results with diallel analyzes for tassel-related traits are scarce in the literature, but they are of great importance in maize breeding programs. One of the main changes introduced by breeding programs in the current single hybrids was the reduction of tassel size; that is, tassel length, tassel mass and also reduction on number of primary and secondary branches of the tassel (Duvick & Cassmann 1999). These modifications resulted in a reduction of the apical dominance, and consequently, in a more vigorous development of the ear, even under conditions of biotic and abiotic stress.
The selection for the three simultaneously environments aiming at increasing grain yield indicated only the 1×10 cross in common to all three sites. On the other hand, the combinations 5×9, 1×2, 1×8, 6×9, 5×8, 4×6 and 1×2 are favorable for at least two environments. An improvement program that has combinations of hybrids with high levels of grain yield is important from the point of view of the recommendation and commercialization of cultivars.
Reciprocal specific combining ability for megatrait plant stature Estimates of reciprocal specific combining ability (RSCA, r ji ) indicate which line is the most promising as female or male parent for the set of traits of agronomic interest. Thus, the correct choice of male or female parent may vary by combination. Silva et al. (2006) point out that the correct choice of the female parent is a decisive aspect in the performance of the hybrid when there is a pronounced maternal effect, being decisive for the final manifestation of the trait(s).
The simultaneous selection of crosses for the three environments, considering the reciprocal effects, revealed only the 3×9 crosses in common to all environments, but the GCA of these two lines does not have constant magnitude across the sites. On the other hand, the simultaneous selection of crosses for two environments has the combinations 2×3, 1×3 and 1×2 as promising for reducing MT-1. However, it is worth mentioning that the lines involved in these crosses are more likely to reduce three variables simultaneously, plant stature, ear height, and plant/ear height ratio.
Simultaneous selection of specific crosses favorable to the three environments can be achieved by the 3×8 crossing, whereas the 2×9, 3×7, 3×6 and 2×9 crossings are common to at least two sites. The few combinations found simultaneously are possibly due to the presence of SCA×E interaction. Thus, it is suggested to identify specific combinations for each site individually.
Simultaneous selection of specific crosses favorable to the three environments was not achieved. Based on the effects of r ji , the promising and common combinations for two of the three sites are 2×10, 5×7 and 5×10. This finding reinforces the importance in regionalizing recommendation of single maize hybrids, focusing on the exploration and commercialization in the regions near to the sites when the F 1 's were evaluated. This regionalization becomes important due to the specific edaphoclimatic conditions of the sites.

CONCLUSIONS
The diallel analyzes based on mega-traits present an important evolution in statistical procedures used in evaluating plant breeding trials, since the simultaneous selection of lines with favorable estimates is based on several traits. In this way, we believe that the proposed method fills an important gap in evaluating diallel trials, being an important statistical tool for breeders.
In our example, three MTs were established. The first, formed by plant stature-related traits, the second by tassel size-related traits, and the third by grain yield-related traits. Individual and joint diallel analysis using the established MTs allowed identifying the best hybrid combinations for achieving F 1 's with lower plant stature, tassel size, and higher grain yield.