Grouping sunflower genotypes for yield , oil content , and reaction to Alternaria leaf spot using GGE biplot

The objective of this work was to evaluate the suitability of the multivariate method of principal component analysis (PCA) using the GGE biplot software for grouping sunflower genotypes for their reaction to Alternaria leaf spot disease (Alternariaster helianthi), and for their yield and oil content. Sixty‐nine genotypes were evaluated for disease severity in the field, at the R3 growth stage, in seven growing seasons, in Londrina, in the state of Paraná, Brazil, using a diagrammatic scale developed for this disease. Yield and oil content were also evaluated. Data were standardized using the software Statistica, and GGE biplot was used for PCA and graphical display of data. The first two principal components explained 77.9% of the total variation. According to the polygonal biplot using the first two principal components and three response variables, the genotypes were divided into seven sectors. Genotypes located on sectors 1 and 2 showed high yield and high oil content, respectively, and those located on sector 7 showed tolerance to the disease and high yield, despite the high disease severity. The principal component analysis using GGE biplot is an efficient method for grouping sunflower genotypes based on the studied variables.


Introduction
Sunflower (Helianthus annuus L.) cropped area is increasing in Brazil, with 144 thousand hectares in the last growing season (Companhia Nacional de Abastecimento, 2014).However, there is an enormous potential for expansion, since sunflower is planted as a second summer crop in succession to soybean, which occupies an area of around 30 million hectares in Brazil (Companhia Nacional de Abastecimento, 2014).The potential for area expansion is also driven by a crescent demand for special oils for human consumption, as high oleic oil, and by the Brazilian government demand for biofuel.
Alternaria leaf spot, caused by Alternariaster helianthi (Hansf.)E.G. Simmons (syn.Alternaria helianthi (Hansf.)Tubaki & Nishihara), can be a threat for sunflower production in Brazil, since it has been prevalent on the crop, occurring in virtually all regions of the country and on all sowing dates.Damage caused by the disease can be due to the reduction of plant photosynthetic area, formation of leaf spots, and early defoliation, which results in the reduction of achene diameter, number of achenes per head, 1000-seed weight, and oil content (Leite et al., 2006;Alves et al., 2013).
An efficient disease control is very difficult when epidemic is already occurring in the field.Among the strategies for managing the disease, genetic resistance is highly desirable because it does not add costs to farmers, and sometimes it can remove the need for other control methods (Leite et al., 2006).Although regional crop variety trials for an important crop are conducted every year, the tested genotypes vary from year to year, leading to high unbalanced data across the years.Strategies have been developed to deal with this data (Yan, 2014).It is interesting to compare the results obtained in different crop seasons, considering that these trials have a common standard genotype (Dow M734), evaluated in all years.
As the results of different genotypes are obtained in different years, it is not possible to perform the analysis of variance combined with only one common genotype.In this case, the most appropriate method to group sunflower genotypes, as a function of the evaluated variables, is the multivariate method of principal component analysis (PCA), which allows the sorting or grouping without losing information.In addition, it linearly transforms a large number of variables into a noncorrelated smaller set of them (Silva & Padovani, 2006).Comparing to the univariate method, using a massive array of data, it can interpret a research with few components using the information of all variables (Rao, 1964).
Using PCA to summarize a large number of variables in ecological studies, Prado et al. (2002) measured the importance of each of these variables on each axis or component by its weight, which is correlated with the axis.Such components are measured in linear functions that allow knowing the size and shape of the data variation.
Principal component analysis proposed by Yan & Rajcan (2002), using the software GGE biplot, enables a singular value decomposition (SVD) of the first two principal components, which means that a matrix is decomposed in three parts: singular values, eigenvectors of columns, and eigenvectors of rows, forming a diagonal matrix.Biplot analysis was first developed by Gabriel (1971), and it can interpret multiple variables in function of different treatments in the same graphic (Akinwale et al., 2014).GGE biplot has been used to show the performance of sunflower hybrids in different testing environments or sowing dates (Ullah et al., 2007;Balalic et al., 2012;Brankovic et al., 2012).A GGE biplot is used to analyze multi-year data of variety trials, where genotypes vary from year to year, treating each year-location as an environment (Yan, 2014).
The objective of this work was to evaluate the suitability of the multivariate method of principal component analysis, using the software GGE biplot, for grouping sunflower genotypes for their reaction to Alternaria leaf spot disease (Alternariaster helianthi), and for their yield and oil content.
The trials were sown in November (2002( , 2004( , and 2005( ), and in October (2007( , 2008( , 2009( , 2011( , and 2012)).Each plot consisted of four 4-meter rows spaced at 0.80 m, with three plants per linear meter.The recommendations for sunflower cultivation were followed, including fertilization, weed control, spraying against insects, and irrigation when necessary.No artificial inoculation of A. helianthi was performed, since the disease occurred by natural infection of plants by the fungus.The pathogen was identified through laboratory isolation and inoculation on plants in a greenhouse.
Assessments of disease severity (%) were performed in two central rows of each plot, discarding 0.5 m from each line end.The individual plant system was adopted (Kranz & Jörg, 1989), by which five homogeneous plants in each plot were marked.
Plants were chosen during V4 stage (Schneiter & Miller, 1981), and an attempt was made to select individuals of the same development stage, height and vigor.Total leaf area was estimated on marked plants (Leite & Amorim, 2002) at the developmental stage R3 (Schneiter & Miller, 1981).Alternaria disease severity (%) was estimated on all leaves using diagrammatic scale of the disease, previously developed and validated (Leite & Amorim, 2002)  simultaneously at R3 growth stage, as recommended by Leite et al. (2006).Plants were harvested individually, after physiological maturity stage (R9) (Schneiter & Miller, 1981), and yield (kg ha -1 ) was evaluated at 11% humidity.Oil content (%) was predicted by NIR spectroscopy (Grunvald et al., 2014).
Precipitation was measured during the seven growing seasons (Figure 1), since water deficiency could affect sunflower production and disease development.As data were obtained in different years, original data of the variables were standardized within each growing season, in order to minimize the effect of the environment, and to reduce experimental variability.This procedure was performed using the software Statistica (Statsoft, 1995), after the normalization of the variables -yield (kg ha -1 ), Alternaria disease severity (%), and oil content (%) -to the same scale as a normal distribution with mean zero and standard deviation one [(N @ (0,1)], ensuring that they are dimensionless.Pearson correlation coefficients for original data were assessed by t test, at 5% probability.
Principal component analysis and biplot graphics were performed by data matrix of sunflower genotypes and by singular value decomposition (SVD) focusing on the treatments (interaction between genotype and year).Eigenvalues and eigenvectors of PCA were calculated for yield, Alternaria leaf spot severity, and oil content, using GGE biplot (Yan & Kang, 2003;Yan et al., 2015).
Afterwards, the resulting first two principal components (PC1 and PC2) were taken to perform the biplot analysis and graphical display of data, using the GGE biplot software.Biplot was calculated by PC1 scores on the abscissa, and PC2 scores on the ordinate for each treatment and each variable (Yan & Rajcan, 2002), and can be expressed as: in which: T ijk is the average value of the combination of genotype and year ik, for trait j; T jk is the average value of the combination of trait and year jk over all genotypes; s jk is the standard deviation of the interaction between trait j and year k, among the genotype averages; ϕ ik 1 and ϕ ik 2 are the PC1 and PC2 scores, respectively, for genotype i; τ jk 1 and τ jk 2 are the PC1 and PC2 scores, respectively, for trait j; and ε ijk is the residual of the model associated with the interaction of genotype and year ik in trait j.

Results and Discussion
Pearson correlation coefficients between disease severity and yield was not significant, and it was negatively correlated (r = -0.21).Results of principal component analysis are more efficient when original data of the studied variables are correlated.Correlation between disease severity and oil content was negative and significant (r = -0.32),and between yield and oil content it was significant and positive (r = 0.28).This indicates that PCA is an option to reasonably quantify the amount of observed variables, in the complex variation structure within and among them (Silva & Padovani, 2006).
The obtained eigenvalues for the three components were, respectively, 1.5423, 0.7949 and 0.6626, totalizing 100% of the total variance (Table 2).Based on these eigenvalues, the results of PCA indicated that the first component accounted for 51.4% and the second one for 26.5% of the total variance among variables.
As to principal component analysis, the number of principal components is always equal to the considered number of variables in the research; however, the number of components or selected axes is not always equal to the maximum number of variables.Usually, the first two components explain the importance of a larger number of variables in the total variation, and the first component is the most important because it has the greatest contribution to the data variation (Silva & Padovani, 2006).In the present research, the first two principal components or two axes explained 77.9% of the total variation.Therefore the first component, by which disease severity and oil content showed the highest contribution, was the most important (Table 2).The second component represents the contribution of disease severity and yield to compare sunflower genotypes.Akinwale et al. (2014) state that no studies have been carried out to specify when the proportion of variation explained by a biplot becomes too small to make a valid conclusion; however, it is generally assumed that any proportion below 40% is too small.This shows the importance of PC1, as reported by Ullah et al. (2007), regarding that ideal sunflower cultivars should have a large PC1 score and a small (absolute) PC2 score.Yan & Tinker (2006) developed studies on genotype x environment interaction, using GGE biplot, in order to verify the selection of the best genetic material that shows stability in different environments.One of the most attractive features of a GGE biplot is its ability to show the which-won-where pattern of a genotype by environment dataset.Many researchers find this use of a biplot intriguing, as it graphically addresses important concepts such as genotype x environment interaction, mega environment differentiation, specific adaptation, etc.In fact, GGE biplot graphic facilitates the visual evaluation of both genotype and genotype x environment interaction, showing different sunflower genotype groups based on their performance (Ullah et al., 2007).
Based on the two principal components with the three investigated variables, the polygon was formed by connecting the markers of the genotypes that were further away from the biplot origin, such that all other genotypes were contained in the polygon.Genotypes located on the vertices of the polygon performed either the best or the poorest in one or more locations, since they had the longest distance from the origin of biplot.The perpendicular lines are equality lines between adjacent genotypes on the polygon, which facilitate visual comparison of them.The equality lines divide the biplot into sectors, and the winning genotype for each sector is the one located on the respective vertex (Yan & Tinker, 2006;Farshadfar et al., 2011).
In the biplot using the first two principal components, the variables disease severity, yield, and oil content were located on three different sectors (Figure 2), within the third and fourth concentric circles (Figure 3).The concentric circles on the biplot help to visualize the vector length (the distance from a marker to the biplot origin) (Yan et al., 2015), and also show the discriminating abilities of the variables (Jalata, 2011).Treatments with longer vectors indicate higher contributions and also higher variances.Genotypes located on the vertices close to the variables were observed as the most responsive ones (Table 3).Opposite effects were observed when genetic materials were placed on vertices located on the opposite side of the studied variables.
According to the polygonal biplot using the first two principal components, considering the variables disease severity, yield and oil content, the evaluated genotypes were divided in seven sectors (Figure 2).The first sector contained the yield vector.The genotype Dow MG52 (C9) was located on the vertex of the sector, showing 1,435 kg ha -1 yield and 50.55% oil content.Genotypes located closer to the origin of the biplot showed low contribution, such as 'BRS Gira 32' (F6) (Table 3).The second sector contained oil content vector, and few genotypes were located on this position, which represents high oil content (Figure 2).On the vertex of the polygon, there was the V20041 (E9), with 44.97% of oil content.The third sector represented low Alternaria leaf spot severity because it was located on the opposite side of the fifth sector, where the disease severity vector was located (Figure 2).Sunflower genotype Helio 358 (B6) was located on the vertex of the polygon (disease severity of 3.29%).Yan & Tinker (2006) stated that the length of the genotype vector, which is the distance between a genotype and the biplot origin, measures the difference of the genotype from the "average" genotype.Therefore, genotypes or any treatment or variables with the longest vectors are either the best or the poorest genotypes.Despite being located on the vertex of the polygon, they are not always the best answer.If they are located on the left side of the biplot, these genotypes show the worst values, and care should be taken to not have an erroneous interpretation.This fact   was observed in the fourth sector (Figure 2), where the open-pollinated variety Embrapa 122 (B2) was located on the vertex of the polygon, showing a long vector (Figure 3), but its values of yield (715 kg ha -1 ) and oil content (33.17%) were very low.Moreover, 'BRS Gira 31' (F5) was located on the vertex of the fifth sector, which represents low oil content.The sixth sector represented high Alternaria leaf spot severity and HLA 04 (D10) was located on the sector vertex.Other genotypes located on this sector Table 3. Contribution of grain yield (kg ha -1 ), Alternaria leaf spot severity (%), and oil content (%) of 69 sunflower genotypes according with principal components PC1, PC2, and PC3 in descending ordination (1) .were Agrobel 910 (A8) and BRS Gira 36 (G10) (Figure 2).The sunflower hybrid Dow M734 was the common genotype included in all trials; it showed a general pattern, since in four out of seven growing seasons it was located on the seventh sector (Figure 2), showing high disease severity (up to 19.25%) and high yield (up to 1,852 kg ha -1 ).Karimizadeh et al. (2013) observed the same fact in lentil genotypes, when GGE biplots for individual years were similarly constructed and indicated, for each year, that some of the treatments fell into different sectors, and some fell into similar sectors, but the general pattern of location groupings did not vary across the years.
'BRS Gira 27' (E6) located on the vertex of the seventh sector and showed the highest yield in 2009/2010 growing season, despite the disease severity of 22.51%.In this sector, there were genotypes showing tolerance to Alternaria disease, which is the ability of plants to produce a good crop, even when they are infected with a pathogen (Agrios, 1997).
According to the agronomical performance, genotypes located on sectors 1 and 2 showed high yield and high oil content, respectively, and should be preferably selected.Genotypes showing tolerance to the disease, found on sector 7, may also be selected.
The present results have several implications for the future breeding, genotype evaluation, and recommendation of sunflower hybrids.GGE biplot has greatly helped in the accurate analysis and data interpretation from breeding and agronomic field experiments (Akinwale et al., 2014), and offered the opportunity to identify adapted sunflower genotypes for resistance to Alternaria leaf spot that can be used in sunflower breeding programs.

Conclusions
1. GGE biplot analysis allows for a meaningful and useful presentation, and is an efficient method for grouping sunflower genotypes, as a function of Alternaria leaf spot severity, yield, and oil content.
2. GGE biplot is adequate to display different sunflower genotype groups based on their agronomical performance.

Figure 1 .
Figure 1.Accumulated precipitation (mm) for the seven growing seasons in Londrina, Brazil.

Figure 2 .
Figure 2. Polygon view of the GGE biplot based on 69 sunflower genotypes evaluated for grain yield, oil content, and reaction to Alternaria leaf spot disease, in seven growing seasons.Codes and details for the genotypes are listed in Table1.

Figure 3 .
Figure 3. Vectors of the GGE biplot, based on 69 sunflower genotypes evaluated for grain yield, oil content, and reaction to Alternaria leaf spot disease in seven growing seasons.Codes and details for the genotypes are listed in Table1.

Table 1 .
Sunflower genotypes evaluated for grain yield, oil content, and reaction to Alternaria leaf spot in seven growing seasons.