Selection to high productivity and stink bugs resistance by multivariate data analyses in soybean

: Stink bugs that affect soybeans are responsible for significant losses in seed production, quality and germination potential, in addition to hindering the mechanized harvest. To develop insect resistant materials, the breeder can compile a selection index by factor analysis. Therefore, the objective of this work was to validate the use of factor analysis, by means of its estimated gains, for the selection of highly productive and stink bugs resistant genotypes in two soybean segregating populations. For this, the phenotypic evaluation was performed in the generation F 2:3 , in two distinct experiments, being the populations from the crosses between IAC-100 × PI 295952 and IAC-100 × PI 306712. The experiments were installed in an 18 × 9 alpha-lattice design, with three replicates for each population. Agronomic and resistance characters were evaluated. The factorial scores for each character were obtained for the creation of “supercharacters”. These were designed to check if the selection in the new characters could provide satisfactory simultaneous gains in the original characters. Subsequently, the analysis of variance was performed for all factors, in both populations. The F test showed the presence of variability among genotypes, allowing the selection of superior genotypes. None of the factors selected progenies with all the characters favorably, and their use was not interesting for both populations. With this, complementary studies should be performed with other selection indices in these populations.


INTRODUCTION
Stink bugs that affect soybeans [Glycine max (L) Merrill] are responsible for significant loss in seed production, caused by grain or pod abortion, as well as leaf retention, hindering mechanized harvesting (Gazzoni 1998).
In Brazil, the most frequent stink bugs species are: Nezara viridula (L.), Piezodorus guildinii (Westwood) and Euschistus heros (F.) (Heteroptera: Pentatomidae) (Medeiros and Megier 2009). The increase in the use of insecticides to control this pest, besides causing environmental damage and raising the cost of crop production, has contributed to the selection of resistant insect populations.
There is an increase in the incidence of insect pest due to increased area and successive cultivation of crops, such as occurs in soybeans. In addition, according to Panizzi et al. (1986) and Lustosa et al. (1999), breeding directed to productivity of grains and to quality of plants or their derivatives can make them more susceptible to insects. Therefore, the development of resistant cultivars would reduce the use of insecticides, which would bring economic, ecological and social benefits (Ventura and Pinheiro 1999).
To develop insect resistant materials, the breeder can compile a selection index by factor analysis. This is a multivariate analysis technique that can be used to predict the selection gains, replacing the traditional method, or by complementing the simultaneous selection techniques based on selection indexes.
For Cruz and Carneiro (2006), factor analysis is a significant alternative because it structures and simplifies the original data so that a large number of variables comes to be represented by a smaller number, expressed by linear combinations of these original data, called factors. Traits grouped in one factor are intensely correlated with each other and weakly correlated with other factors.
These factors are extracted by principal component analysis, whose function is to simplify a set of n variables into factors with the ability of joining the maximum amount of original variation available, while remaining mutually independent (Cruz and Carneiro 2006).
The use of factor analysis to develop indexes that allow accurate selection of maize genotypes had been used in some studies (DoVale et al. 2011;Reis et al. 2017). However, there is an absence of information on the use of these factor analyses in more than one segregating population and for resistance to stink bug complex in soybean. Therefore, the objective of this work was to validate the use of factor analysis, by means of its estimated gains, for the selection of highly productive and resistant to stink bugs genotypes in two soybean segregating populations.

MATERIAL AND METHODS
Three parents were used to perform the crosses. One of them, the cultivar IAC-100, was used in the synthesis of both populations. This cultivar was developed by the Agronomic Institute of Campinas (IAC, from Portuguese Instituto Agronômico de Campinas), state of São Paulo, showing resistance to the stink bug complex by several mechanisms (Carrão- Panizzi and Kitamura 1995;Pinheiro et al. 2005). The other two parents were the exotic soybean genotypes PI 295952 and PI 306712.
The phenotypic evaluation was performed in the F 2:3 generation in two distinct experiments, the population derived from the cross IAC-100 × PI 295952 being considered "population 1", and the population derived from IAC-100 × PI 306712 considered "population 2". Both were composed of 160 individuals and their respective parents (IAC-100 and one of the exotic genotypes). These experiments were installed in December 2013, in a homogeneous area located at the Anhumas Experimental Station (22°17'43"S and 51°23'14"W), Department of Genetics, Luiz de Queiroz College of Agriculture (ESALQ/USP), located in the city of Piracicaba, SP. Interleaving the useful rows of each experiment, two rows of border were sown using the BRS133 susceptible cultivar to avoid damage in the plots during the evaluations and to approximate the environmental conditions of a commercial field. The experimental plot was represented by five plants derived from an F 2 plant, and the average of these plants was considered for the statistical analysis. The distance between plants was 50 cm on the useful lines and 1.5 m between the useful lines, totaling 3.75 m² per plot. The 18 × 9 alpha-lattice design was adopted, with three replicates.
No chemical insects' control was performed and, after flowering, the level of stink bugs infestation was evaluated using the beat cloth method for ten weeks, with a minimum of eight sample points per day of evaluation.
The evaluated agronomic characters were: number of days for flowering (NDF); number of days to maturity (NDM); plant height at maturity (PHM) in cm; lodging (L), evaluated at maturity by a scale of visual notes from 1 to 5 (1 corresponding to the erect plant and 5 to the plant fully lodged); agronomic value (AV), evaluated at maturity through a scale of visual notes from 1 to 5 (1 corresponding to the plant with no agronomic value and note 5 to the plant with excellent agronomic characteristics); and grain yield (GY), evaluated by the grains weight (g) of each plant.
Characteristics associated with insect resistance were also evaluated: grain filling period in days (GP); leaf retention (LR), evaluated at maturity by a scale of visual notes from 1 to 5 (1 attributed to the plant without leaf retention (normal senescence) and 5 to the plant with total leaf retention (green leaves and stems); weight of one hundred seeds in g (WHS), from a random sample after humidity standardization; and healthy seed weight in g (HSW), that is, without damage caused by stink bugs, evaluated using sieves after grains harvest and process, as proposed by Rocha et al. (2014).
Initially, for all evaluated characters, analyzes of variance were performed in each population, using the following model (Eq. 1): where: Y ijk is the value observed for the character in the i-th genotype, in the j-th repetition, in the k-th block; μ is the overall mean for the character; α i is the effect of the i-th genotype (i = 1, 2, 3,..., 160), considered as having a random effect, where, g~NID (0, σ² g ); r j is the effect of the j-th repetition (j = 1, 2, 3); b k(j) is the effect of the block within the repetition (k = 1, 2, 3,..., 9); ε ik is the effect of the random error associated with the observation of order ijk, where ε~NID (0, σ²).
To perform a principal components analysis, it is assumed that X ij is the standardized mean of the j-th character (j = 1, 2, ..., v) evaluated in the i-th genotype (i = 1, 2, ... g). The principal components technique consists of transforming the set of v characters (X i1 , X i2 , ..., X iV ) into a new set (Y i1 , Y i2 , ..., Y iV ), which are linear functions of the X i 's and independent of each other. Therefore, a principal component can be given by the following linear combination (Eq. 2): Among all components, the first presents greater variance than the second, and so on. Additionally, the covariance between each pair of components is zero. Each eigenvalue corresponds to an eigenvector with the same number of elements as the initial characters. The factor analysis model used (Eq. 3) was suggested by Cruz and Carneiro (2006): where: X j is the j-th character ( j = 1, 2, ..., v); I jK is the factorial load for the j-th variable associated with the k-th factor (k = 1, 2, ..., m); F K is the k-th common factor; ε j is the specific factor. The initial factorial load represents the correlation between the character j and the factor k, defined by Eq. 4: where: λ I is the i-th eigenvalue greater than 1 obtained from the phenotypic correlation matrix; V IJ is the j-th value of the i-th eigenvector.
The fraction of the variance of X J explained by the factors is called commonality and is defined by Eq. 5: The technique involves several stages with the establishment of the number of common factors to be used, the calculation of the initial loads of these factors and the relation of the factors, obtaining the final loads and allowing to define factors. The last step involves the estimation of factor scores.
Each eigenvalue equal or greater than the 1.00 corresponds to an eigenvector which consists of a number of values equal to the number of original characters. Therefore, a given factor will have individual loads for all characters, which explains the reason for the name "common factor", used in factor analysis theory.
The rotation method used was the varimax, since it was assumed that the common factors are orthogonal to each other (Cruz and Carneiro 2006). This is an important step in factor analysis to quantify the effect of each common factor on the expression of the characters. The final scores are obtained by Eq. 6 presented by Manly (1986): where: F* is the vector of dimension 1 × m of factorial scores; Λ is the matrix of dimension p × m of the final factorial loads; X is the vector with size p × 1 of the characters of the k-th genotype.
With this, the factorial scores for each character were obtained for the creation of the "supercharacters", that were designed to check if the selection in these new characters could provide satisfactory simultaneous gains in the original characters.
The factors created by the sums of the multiplications of the factorial scores by the characters, when they presented negative values, were transformed by the sum of the constant 1 + [z], where: [z] is the magnitude of the lowest value.
The factors were submitted to analysis of variance by the same model used for the original characters. To obtain the predicted gains in factor selection, the direct selection of 25 superior progenies was adopted using Eq. 7: where: h 2 is the heritability of the character; X s e X o are the means of the selected and original population, respectively. Analyzes of variance were performed using the software Statistical Analysis System (SAS) version 9.2 (SAS 2007). Principal components analysis and factor analysis were performed using the Genes Program -Computational Application in Genetics and Statistics (Cruz 2013).

RESULTS AND DISCUSSION
All characters had a normal distribution and homogeneity of residues at 5% probability (data not shown). The analyzes of variance for the two populations have evidenced the existence of variability for all the characters, allowing the selection of superior genotypes. The coefficients of variation were between 2.26 (NDM) and 45.56 (HSW) (data not shown). High CV values can be explained by stink bug behavior, as the distribution of these insects is not uniform in the field, resulting in some genotypes more damaged than others. High CV values were also found by Rocha et al. (2014).
To determine the factorial loads, held the principal components analysis, obtaining their eigenvalues and respective eigenvectors for each population. The eigenvalues that absorbed at least 80% of the total variation among the analyzed characters were considered, which also coincided with the selection of factors that considers the number of eigenvalues greater than unity. With this, four factors were obtained for population 1 (Table 1) and three for population 2 (Table 2).
Subsequently, factor analysis was performed for each population, and the final factorial loads were obtained after rotation by the varimax method. The communality is the measure of the efficiency of character representation by a common part, also involved in the other analyzed characters (Cruz and Carneiro 2006). According to Souza (1988), values of commonality Number of days for flowering (NDF in days), grain filling period (GP in days), number of days to maturity (NDM in days), plant height at maturity (PHM in cm), lodging (L in notes), leaf retention (LR in notes), agronomic value (AV in notes), grain yield (GY in g), healthy seeds weight (HSW in g) and weight of one hundred seeds (WHS in g).
Furthermore, the preferred directions for each character were stated, being desirable the increase of the characters NDF, AV, GY and HSW; and the decrease of GP, L, LR and WHS, for both populations. However, two characters had different preferential directions in each population, being desirable the nonmodification of NDM and the increase of PHM in population 1, while the decrease of them in population 2 is more interesting.
Considering the preferred directions mentioned above, neither factor for both populations fully met the requirements for efficient selection, with at most 7 of 10 directions being met for Factors 1 and 3 of population 1. Among the factors elaborated, the least efficient in directing the characteristics was Factor 3 of population 2, which attended only PHM.
The best genotypes in each population were selected based on the coefficients of weighting of traits from the scores obtained in each factor (Cruz and Carneiro 2006;DoVale et al. 2011). The direct selection in the Factors of population 1 can be compared to the Confidence Interval (CI) of original population means (Table 6). Some characters had the opposite signal as expected according to the factor scores in the formulas. Among these, NDF in Factors 2 and 3, GP in Factor 3, L and AV in Factor 2 and GY in Factor 3. In this population it would be interesting to increase NDF to select genotypes with higher juvenile period. However, there was the opposite effect on Factors 2 and 3, reducing the Table 4. Values of commonalities, initial and final factor loads obtained in the factors analysis from the mean values of the characters evaluated in 162 F 2:3 genotypes soybean of population 2, Piracicaba-SP, 2014. Number of days for flowering (NDF in days), grain filling period (GP in days), number of days to maturity (NDM in days), plant height at maturity (PHM in cm), lodging (L in notes), leaf retention (LR in notes), agronomic value (AV in notes), grain yield (GY in g), healthy seeds weight (HSW in g) and weight of one hundred seeds (WHS in g).
characteristic that is already considered low. The increase in GP can be observed in Factor 3, which is undesirable as this is the period when plants are most susceptible to stink bug attack, while Factors 2 and 4 did not differ from the original population. Almost all factors selected materials with reduction for NDM, except Factor 2, which was the only one to maintain the original population average, which is a desirable effect for this population. Factors 3 and 4 had the effect of reducing the PHM. This character has a high positive correlation with lodging (L), already reported by Gallon et al. (2016), Degrees of freedom (DF); ns not significant, * significant at 5% and ** significant at 1% of probability by F test. and the selection gain to PHM had the same effect in L means. For LR, all factors maintained parental averages, but it would be interesting to reduce this character. According to Silva et al. (2013), LR may be caused by several biotic and abiotic factors, such as stink bug attack, water stress, nutritional imbalance, disease occurrence, and predisposition of cultivars. Therefore, reducing this feature would select materials that are tolerant of all these factors. Character AV was selected for its reduction in Factors 1 and 2, and kept the average in Factor 3, and its preferential direction would be to increase its values, since this character is measured considering its agricultural suitability as a whole. Both GY and HSW characters had their characters increased, regardless of previous character directions, except for Factor 4, which kept the original population average. This indicates that selection can increase the yield and weight of good seeds even by select pseudoresistance characters in an unfavorable direction, as in the WHS character that had all possible directions but did not change the HSW. Thus, it can be assumed that pseudoresistance mechanisms are not really efficient in resistance to stink bugs, because even selecting resistance characters contrary to the desired, there was an increase in the most important character, which is HSW. Many effects were similar in population 2, such as reduced NDF and increased GP and WHS in Factors 1 and 3, maintenance of parental mean in NDM and WHS, increased PHM and L, and reduced AV in Factor 2 (Table 7).
Therefore, none of the factors selected progenies with all the characters in a favorable way, being not interesting for both populations. Table 7. Predicted gains percentages estimates (GS (%) ), selected population mean (X -S ) by direct selection in the factorial scores, and mean (X -S ) and confidence interval (CI) of original population for the characters evaluated in 162 F 2:3 of soybean genotypes in population 2, Piracicaba-SP, 2014.

CONCLUSION
Factor analysis could not provide indexes for selection, in which all characters had an undesired response. Furthermore, selection in the "supercharacters" resulted in different select direction and undesired effects on the original variables, and were not ideal for the studied populations.
Therefore, complementary studies should be done with other selection indices in these populations, in order to establish an index in which all the characters are selected favorably and with satisfactory gains for the application in breeding programs.