Choosing parent tropical wheat genotypes through genetic dissimilarity based on REML/BLUP

Parent selection is a crucial step in breeding programs. In the present study, we evaluated the genetic diversity in tropical wheat genotypes using best linear unbiased predictions (BLUPs) by different grouping methods. We identified potential parents to compose a crossing block with the aim of improving wheat for the Brazilian Cerrado. A total of 41 tropical wheat genotypes were evaluated in a field experiment. The evaluated traits included days to flowering; disease symptoms of fusarium head blight, blast, and leaf rust; flag leaf height; plant height; spike mass; hectoliter weight; and grain yield. The BLUPs were estimated and, from these, the standardized average Euclidean distance was calculated. Then, UPGMA, Tocher, and principal component clusters were generated from this genotypic distance matrix. Evaluating genetic diversity based on BLUP allowed the identification of two groups of highly dissimilar genotypes with high estimated genotypic values with which to compose a partial diallel.


INTRODUCTION
Brazil produces about 5.3 million tons of wheat annually, and national consumption exceeds 12 million tons. This generates a deficit of the product in the national market, which is corrected for by imports, and the need for production alternatives capable of supplying the domestic market has become apparent. Currently, southern Brazil contains 90% of the total wheat cultivation area, which is primarily concentrated in the states of Paraná and Rio Grande do Sul (CONAB 2020). In 2019, the National Supply Company of Brazil (CONAB) recorded that southeastern and midwestern regions cultivated 227.40 thousand ha of wheat. These regions, although not traditionally wheat producers, show strong potential for further cultivation of the crop (Pasinato et al. 2018). The expansion of wheat cultivation to new frontiers is dependent on the development of cultivars adapted to these new locations. The average genetic gain in wheat grain yield in tropical Brazilian regions was 48 kg ha -1 year -1 between 1976 and 2005 for irrigated wheat (Cargnin et al. 2008), showing that improvements have been occurring with the development of cultivars suited to tropical regions. Furthermore, Bornhofen et al. (2018) identified a genetic progress of 34.8 kg ha -1 year -1 , which depended greatly on environmental variables.

CR Casagrande et al.
The best linear unbiased predictor (BLUP) method, which uses components of variances estimated via restricted maximum likelihood (REML), has allowed breeders to more efficiently and accurately select and predict genetic values as it generates robust estimates. The efficiency of using mixed models in wheat breeding programs has been evaluated by Pimentel et al. (2014) and Pagliosa et al. (2017). Genetic diversity studies are important in wheat breeding programs because the selection of more divergent parents for crossing results in greater variability in the segregating population and, consequently, the greater is the probability of the alleles regrouping in favorable combinations. Genetically complementing parents are sought after to produce morpho-agronomic traits of interest. Scherlosky et al. (2018) observed the existence of genetic variability in wheat crops over four decades of plant breeding. In addition, Chaves et al. (2020) revealed strong molecular diversity when evaluating different Brazilian wheat cultivars. Genetic diversity can indicate promising crosses, as shown in a study of 30 bread wheat genotypes in a research center in India (Kumar et al. 2013). To the best of our knowledge, there are currently no studies in the literature that have assessed genetic diversity to establish crossing blocks in wheat. In addition, all existing studies using phenotypic data address conventional methods of analysis of variance and phenotypic means, which may not represent genetic distance in most cases. A more promising strategy than classical phenotypic analysis is the analysis of molecular diversity, which has been widely applied in wheat in recent years (Spanic et al. 2016). Tadesse et al. (2019) concluded that the key to increasing genetic gain in wheat lies in crossing divergent parents with high frequencies of favorable alleles.
As there is a great potential for the expansion of wheat cultivation areas in tropical regions, there is a need to develop genotypes adapted to the edaphoclimatic particularities of these regions that also demonstrate high agronomic performance (Pereira et al. 2019). With this in mind, the objectives of the present study were to evaluate wheat genetic diversity based on standardized average Euclidean distances calculated based on BLUP and to use the unweighted pairgroup method with arithmetic mean (UPGMA), Tocher, and principal component grouping methods to select the most promising parents to compose a crossing block.

Wheat genotypes
The evaluated genotypes consisted of 32 tropical wheat genetic lines at a preliminary stage (EPL) and value of cultivation and use (VCU) developed by the UFV Wheat Breeding Program and the following nine commercial cultivars widely grown by different breeding companies in the central South and Cerrado regions of Brazil: BRS 394, BRS 264, BRS 254 (EMBRAPA), CD 1303 (COODETEC), TBIO Aton, TBIO Duque, TBIO Ponteiro, TBIO Sintonia, and TBIO Sossego (Biotrigo Genética). The experiment was conducted at the Universidade Federal de Viçosa, Viçosa, Minas Gerais in the Professor Diego Alves de Mello experimental field (lat 20º 45' 14'' S, long 42º 52' 55'' W, alt 648 m asl). Sowing was carried out mechanically on June 10, 2019 and harvesting occurred on October 6, 2019. The genotypes were arranged in a randomized block design with three replications. The experimental plots were composed of five lines of 5 m in length, with an inter-row spacing of 0.20 m and a population density of 400 seeds m -2 . However, measurements were only taken of the three central rows.

Management
Basic fertilization was carried out according to the chemical composition of the soil, to meet the requirements of the crop. At the time of sowing, 300 kg ha -1 of formula 08-28-16 (nitrogen, phosphorus, potassium) was applied to furrows. In coverage, 90 kg ha -1 of nitrogen was applied in two stages, 50% at the beginning of tillering and 50% at the start of booting, at stages 21 and 45 of the Zadoks et al. (1974) scale, respectively. The nitrogen source used was urea (45% N), totaling 200 kg ha -1 . Chemical control of weeds was performed using metsulfuron-methyl as the active ingredient, in a dosage of 5 g ha -1 of commercial product approximately 20 days after sowing. No chemical control of diseases was carried out to observe the natural resistance of the genotypes. The experiment was carried out under sprinkler irrigation according to the water needs of the genotypes.

Traits evaluated
Measurements of the following traits were performed for plants in the three central rows of each plot. Days to flowering (DF) were observed from phase 10 to phase 65 of the Zadoks et al. (1974) scale and were counted when 50% of the plants in the plot presented flowering spikes. Observations of disease symptoms for fusarium head blight (Fusarium graminearum), blast (Magnaporthe oryzae), and leaf rust (Puccinia triticina) (DN) were recorded according to the severity of disease affecting the leaves and spikes, with the strongest resistance being noted as 5 (free of diseases) and the weakest noted as 1 (high disease intensity) in both the leaves and spike. The flag leaf height (FLH) and plant height (PH) were measured in centimeters at the time of harvest and from the coleoptile to the insertion of the flag leaf and the tip of the spike, excluding the awns, respectively. Spike mass (SM) was measured in the post-harvest phase using a scale with a precision of 0.001 g. Height and SM measurements were performed on 10 randomly harvested plants from the experimental plot. Hectoliter weight (HW) was determined according to a specific scale purchased from the Dalle Molle brand and was measured in kg 100 L -1 . Grain yield (GY) was determined in kg ha -1 , with adjustment for 13% humidity in all plots.

Biometric analyses
Data were submitted to deviance analysis to estimate genetic parameters, genotypic values, and confidence intervals of genotypic values, using the REML/BLUP methodology in which the genetic-statistical model used to estimate the components of variance and to predict the genotypic values was model 21 in the Selegen software (Resende 2016), using the following equation: where y is the data vector; r is the vector of repetition effects (assumed to be fixed) plus the general average; g is the vector of the genotypic effects (assumed to be random) (g~N(0, σ 2 g )) where σ 2 g is the genotypic variance; e is the vector of errors or (random) residuals (e~N(0, σ 2 e )) where σ 2 e is the residual variance matrix; and X and Z are incidence matrices for said effects.
The standardized average Euclidean distance between each genotype pair was calculated for the 41 genotypes using the predicted BLUP values, as follows: where Y j is the genotypic value of trait j and σ̂j is the standard deviation associated with the jth trait, then where dii' is the average Euclidean distance based on standardized data, n is the number of traits analyzed, and y ij is the observation of the ith genotype for the jth trait.
Thus, a g × g distance matrix was obtained, where g = 41. Then, Tocher's optimization grouping method and the hierarchical grouping of the average link between groups (or UPGMA) method were applied. In the latter, the optimal number of groups was determined by the methodology proposed by Mojena (1977), and k = 1.25 was adopted as a stop rule in defining the number of groups, as suggested by Milligan and Cooper (1985). The association between the graphic matrix generated by the UPGMA methodology and the original distance matrix (Euclidean distance) was determined by the cophenetic correlation coefficient method, and significance was determined by Mantel test, with 10.000 permutations.
Principal component analysis was carried out to identify traits that explained the total variation to greater and lesser extents and as a cluster analysis. For this, a scatter plot was generated from the first principle components, allowing the groups formed to be visualized. The genotype correlation matrix between the traits was evaluated using the correlation network. The A = h (R) adjacency matrix was used to determine the connections between the traits, with the following function: where SNG is the sharing needles group and ρ is the parameter that determines the minimum value for a correlation to be represented in the correlation network. In this study, the value of ρ was set to zero to ensure that all relationships between traits were included. The thickness of the lines represents the magnitude of the association between the traits with a cutoff value of 0.30, meaning that only correlations |rij| ≥ 0.30 were represented by highlighted lines. The positive associations were colored green and the negative associations red. These analyses were performed using the Genes (Cruz 2016) and R (R Core Team 2019) software, and figures were prepared using the SigmaPlot 14.0 software.

RESULTS AND DISCUSSION
Deviance analysis revealed a significant effect of genotype for all evaluated traits according to maximum likelihood ratio test at 1% probability (Table 1), indicating genetic variability between wheat genotypes. These results suggest the possibility of analyzing genetic diversity. For most of the evaluated traits, apart from GY, there was a predominance of genetic variance in total phenotypic variation. Heritability estimates between the means of the genotypes ranged from 0.70 (GY) to 0.92 (DF). Therefore, according to the classification proposed by Resende and Duarte (2007), the selective accuracy values were classified as very high (>0.90) for the DF, DN, FLH, SM, and HW traits, whereas the selective accuracy values of the other traits (PH and GY) were considered high (>0.70). These high selective accuracy estimates indicated strong precision in genotype selection. These results also corroborate the residual coefficient of variation estimates found and classify the present study as a high-precision experiment.
The BLUP estimates for DF ( Figure 1A) indicated the existence of highly desirable genotypic values for wheat breeding programs (Beche et al. 2018). The majority of the UFV germplasms and the EMBRAPA control cultivars showed greater precocity compared to that of the control cultivars of breeders from private companies. By contrast, analysis of the DN trait ( Figure 1B) revealed that the cultivars with the longest cycle were the least affected by diseases. This relationship can be explained by the fact that the cultivars evaluated in this study with the longest cycles were developed most recently, and thus, launched with reduced susceptibility to major diseases. The same pattern was observed for FLH and PH ( Figure 1C, D), as the cultivars from private companies, (TBIO Sintonia, TBIO Sossego, TBIO Ponteiro, TBIO Aton, TBIO Duque, and CD 1303) presented higher FLH and PH values in addition to greater DF and greater DN values. Breeding programs should aim to minimize FLH and PH values in selection (Richards et al. 2019), especially in the Brazilian Cerrado, because taller genotypes tend to exhibit lodging if the system is irrigated.
Cultivars from private companies produced the lowest SM genotypic values; however, this same pattern was not observed for GY ( Figure 1E), superior to the new elite lines (Woyann et al. 2019). The TBIO Aton cultivar, which produced the third lowest SM genotypic value, presented the highest BLUP predicted for GY, at 5083.8 kg ha -1 ( Figure  1G). These results indicate that SM has no direct effect on GY. Therefore, selecting genotypes that produce lower SM is recommended. The correlation network showed that genotypes with the longest cycles were also those that reached taller heights and had lowest incidences of disease ( Figure 2). In addition, these long-cycle cultivars had lower SM, and higher SM estimates were associated with lower FLH and PH estimates. Genotypic correlations can either be transient, owing to factor linkage, or permanent, owing to the presence of pleiotropic genes (Cruz et al. 2012). Knowledge of the existence and magnitude of correlations between traits is essential when selecting parents to develop superior cultivars through genetic diversity analyses.
The grouping of the genotypes using the UPGMA method ( Figure 3) enabled stratification into six different groups with strong grouping consistency and a cophenetic correlation estimate of 0.81 between the original matrix and the  distance graph. Groups 1 and 2 were composed of only one genotype, TBIO Aton and EPL18161, respectively. Group 3 was composed of the other TBIO cultivars and CD 1303. Groups 4 and 5 were each composed of seven UFV lines, and group 6, the largest, consisted of 17 UFV lines and the BRS 254, BRS 264, and BRS 394 cultivars (Table 2, Figure 3). Other studies have already focused on wheat genetic diversity (Khodadadi et al. 2011, Mwadzingeni et al. 2016; however, distances measured from phenotypic data may not always represent genetic diversity. For this reason, assessing diversity based on BLUP genotypic values is promising for breeding programs. A highly similar pattern of grouping was found when the Tocher grouping method was used ( Table 2). The six groups previously formed using the UPGMA method were condensed into three. All lines developed by UFV belonged to the same group, together with cultivars developed by EMBRAPA. This can be explained by the fact that these cultivars were used as parents to obtain lines in the past. In addition, TBIO Aton did not form a group with any other cultivar, even using the Tocher method, owing to its estimated DN and GY genotypic values being high and favorable. To identify highly promising parents to  compose a crossing block, interpreting these genotypic values is extremely important. It is not enough for parents to be divergent; they also need to present high mean estimates for traits of interest. Because of this, the practice of selection in segregating generation is promising.
We propose that the following two distinct groups of parents be formed with the aim of divergent selection. Group 1 will combine the TBIO Aton, TBIO Sintonia, TBIO Duque, TBIO Sossego, TBIO Ponteiro, and CD 1303 parents owing to their high GY and DN estimates. Parents that are resistant to wheat blast disease are essential, as this is a major disease of wheat production systems in central Brazil (Rocha et al. 2019). However, these genotypes had the disadvantage of longer cycles (66.16 -69.30 days) and PH values slightly higher than those usually sought by breeders, ranging from 85.30 (TBIO Aton) to 94.35 cm (TBIO Sintonia).
The other group will consist of the genotypes belonging to group 6 of the UPGMA dendrogram, with emphasis on the BRS 254, BRS 264, BRS 394, VCU21898, VCU18169, and VCU11811 parents. The BRS genotypes are potentially complementary to the first group owing to their genotypic values for DF being considered short (57.25,57.25,and 59.40 days for BRS 254,BRS 264,and BRS 394,respectively). In addition, all BRS genotypes produced low PH estimates (less than 88 cm) and high GY estimates (ranging from 4582.42 to 4716.87 kg ha -1 ). The VCU21898, VCU18169, and VCU11811 genotypes, in addition to producing high GYs (4991.41,4923.43,and 4780.97 kg ha -1 , respectively), have short cycles (<60 days) and reduced PH (<90 cm). Identifying parents that are complementary to the first group established by the diallel and that present high genotypic values for GY is essential. Many genotypes have traits of interest for wheat breeding programs, but they are wild or little improved, which does not contribute to effective genetic gain in selection. Thus, we propose that these 12 parents should be crossbred in a partial manner, in two different groups, each composed of six parents in a 6 × 6 scheme, resulting in 36 populations.
Defining the parents that will form part of the groups in the partial diallel based on genetic diversity analyses is extremely effective in wheat selection programs, as the alleles fixed for a given trait in a group are generally different from those fixed for the same trait in the other group, generating complementarity between gene loci and, consequently, resulting in the generation of superior transgressive segregates. Some studies have been conducted involving partial diallel analysis of wheat in which the establishment of the groups did not obey criteria based on genetic diversity, which lead to low genetic complementarity, both general and specific (Pimentel et al. 2013a, Pimentel et al. 2013b).
The relative importance of each trait was assessed using the principal component technique (Table 3), following the methodology of Jolliffe (1972). Less important traits that can be discarded in future wheat diversity research are those that presented higher eigenvector estimates (in absolute value) in the lower eigenvalues of the principal components. Jolliffe (1972) recommended discarding traits with a magnitude of less than 0.70 from eigenvalues, meaning that the FLH, DF, and SM traits could be discarded in the present study. In addition to having little variability, these traits were also correlated with others measured in the current study. This method of evaluating the importance of traits has an additional advantage when compared to the method of Singh (1981), which calculates the relative importance of traits by variability alone and disregards redundancy among them. In addition, owing to the lack of a residual covariance matrix to estimate Mahalanobis distance, it is necessary to use Euclidean distance in studies based on BLUP, and Singh's method (1981) is considered inappropriate for such situations.  As the first three principal components represented approximately 80% of the total variation, the graphic use of the dispersion of scores for genetic diversity is consistent. Thus, the two-dimensional representation of PC1 × PC2 and PC1 × PC3 (Figure 4) confirms the results obtained by both the UPGMA clustering method and the Tocher optimization method. The formation of a more cohesive group composed of the different private companies' genotypes and the formation of another group composed of the majority of UFV lines and the EMBRAPA cultivars can be seen. Furthermore, as in the other methods, the dissimilarity of the EPL18161 line was observed owing to its high PH and low DF estimates.

CONCLUSIONS
Evaluating genetic diversity based on Euclidean distance from genotype values (BLUP) is highly promising for wheat breeding programs. By identifying divergent genotypes based on three different biometric methodologies (UPGMA, Tocher, and principal component) and owing to high estimates of genotypic values for the traits of interest, it was possible to establish two groups of parent lines that will form a partial diallel aiming at complementarity and to develop progeny cultivars with high grain yield, disease resistance, shorter stature, and shorter cycles.