Predictive approach to optimize the number of visual graders for indirect selection of high-yielding Urochloa ruziziensis genotypes

Forage plant breeders often use visual scores to assess agronomic traits because of the costs associated with in-depth phenotyping in the initial stages of breeding cycles. The aim of this study was to investigate the impact of the number of graders on the effectiveness of indirect selection of high-yielding genotypes and determine an optimal number of graders in the early-stage trials of Urochloa ruziziensis. For that purpose, five graders assessed 2.219 U. ruziziensis genotypes in an augmented block design. Biomass production and vigor scores were evaluated in two cuts and were analyzed using a linear mixed model approach. Vigor scores were analyzed considering each grader’s score and the combinations of two, three, four, and five graders. Genetic variance was significant for both traits. Visual evaluation was effective in identifying productive genotypes based on the statistical criteria. The optimal number of graders for indirect selection of high-yielding U. ruziziensis genotypes is three.


INTRODUCTION
Urochloa ruziziensis (R. Germ. and CM Evrard) Crins. (synonym Brachiaria ruziziensis) is a forage species with high agronomic and nutritional potential widely used in breeding programs (Marcelino et al. 2020). Some agronomic traits in U. ruziziensis and other forages are often visually assessed under field conditions because this assessment is practical and effective. Traits routinely evaluated using such techniques include i) pest and disease resistance (Silva et al. 2013), ii) regrowth capacity (Berchembrock et al. 2020, iii) stay-green (Wilkinson and Hill 2003), and iv) plant growth and biomass production (Burton 1982, Price andCasler 2014).
Several factors might affect the effectiveness of visual selection in forage breeding, such as genetic control of the trait and the number of graders or evaluators. For quantitative traits, such as grain yield, some studies have reported low effectiveness (Abreu et al. 2010, Abadassi 2014. However, visual selection has achieved satisfactory results in U. ruziziensis for quantitative traits, such as pest resistance (Souza Sobrinho et al. 2010) and vigor (Teixeira JMO Fonseca et al. et al. 2020). The number of graders was examined by Abreu et al. (2010), who investigated strategies to improve mass selection in maize using ten graders. Helms et al. (1995) indirectly selected high-yielding soybean lines using three graders. Bowman et al. (2004) also used three graders to select for yield in cotton. However, none of these studies explained why the respective numbers of graders were adopted. Similarly, in U. ruziziensis, some studies used visual evaluation to derive phenotypic measurements (Teixeira et al. 2020), but the importance or effect of the number of graders on the results presented was not discussed.
The applicability of visual evaluation goes beyond the scope of forage breeding. Animal breeders commonly assess relevant traits using visual scores. For instance, the percentage of muscle and fat, marketable cuts, and carcass traits are assessed via visual scores (Conroy et al. 2010). However, as with plants, an explanation that justifies the use of a certain number of graders is lacking. Toral et al. (2011) estimated the response to selection for post-weaning daily weight gains involving 47.000 animals using visual scores on related traits (i.e., conformation, precocity and muscling). In that case, groups of animals were established and a single grader ranked each group; however, no reasons were given in regards to the adoption of a single-grader approach.
Determining an optimal number of graders could provide better estimates of genetic parameters and more accurate breeding values. Equally important, this optimal number is likely to improve the effectiveness of the breeding pipeline and, ultimately, the rates of genetic gain. However, before determining this number, the correlation of selection with the trait of interest and the quality of the experimental data set must be assessed (Price and Casler 2014). Once these estimates are available and proven to be useful, a predictive approach for optimizing the number of graders should be performed. Therefore, the objectives of this study were to i) investigate whether the number of graders affects the effectiveness of indirect selection of high-yielding U. ruziziensis genotypes and ii) determine an optimal number of graders to assess U. ruziziensis genotypes within a forage breeding program.

MATERIAL AND METHODS
The experiment was conducted at the Embrapa Gado de Leite (Embrapa Dairy Cattle) experimental station (lat 21° 33' S, long 43° 06' W, alt 410 m asl) in Coronel Pacheco, MG, Brazil. The soil type at this site was predominantly an Argissolo Vermelho-Amarelo Alumínico (Santos et al. 2006). The climate is classified as humid subtropical (Cwa type mesothermic) according to Köppen, characterized by low rainfall during the winter season and an annual mean temperature of 19 °C. The region typically has dry and cold winters, whereas the summer season is predominantly humid with moderate to high temperatures. Mean annual rainfall is 1.536 mm.
An experiment in an augmented block design, consisting of 51 blocks ranging from 28 to 72 genotypes within each block, was carried out in August 2011. Regular treatments consisted of 2.219 U. ruziziensis genotypes derived from selected seeds in the second recurrent intraspecific cycle of the Embrapa Dairy Cattle breeding program. Two check cultivars, Marandu (U. brizantha) and Basilisk (U. decumbens), were replicated treatments. Plots consisted of a single plant with an average area of 1.5 m 2 , which were spaced one meter apart from each other.
At planting, 350 kg ha -1 of the formulation 8-28-16 (nitrogen-phosphorus-potassium) was applied. For topdressing fertilization, 1 ton ha -1 of the formulation 20-05-20 (nitrogen-phosphorus-potassium) was fractionated over the cuts. Manual weeding was performed when necessary, and a cut to homogenize the trial was carried out at the beginning of December 2011. In 2012, two evaluation cuts were performed, one in January and another after around 40 days.
Biomass production (grams) and vigor scores associated with plant yield were measured as follows. Visual scores ranged from 1 to 5 as described in the following scale: 1 -very bad, 2 -bad, 3 -average, 4 -good, and 5 -very good. Five graders, including forage breeders, graduate students, and technicians, performed visual phenotyping right before each cut. All five graders assessed all genotypes in both cuts. To evaluate biomass production, plants were cut at 5.0 cm above soil level; then, the biomass harvested was weighed using a portable suspension scale.
Phenotypic data were analyzed using a mixed model approach (Henderson 1984) with recovery of interblock and intergenotypic information. Biomass production data were fitted according to statistical model 1, while model 2 was used for vigor score data.
where y is the vector of biomass production or vigor score data; μ is the intercept; X, Z 1 , Z 2 , and Z 3 are incidence matrices of model effects; b is the vector of block effects, b ~ N(0, Iσ 2 b ); g is the vector of genotype effects, g ~ N(0, Iσ 2 g ); s is the vector of grader effects, s ~ N(0, Iσ 2 s ); and e is the vector of errors, e ~ N(0, Iσ 2 e ). Specifically for model 2, y corresponded to the vigor score attributed by each grader in the case of two or more graders.
For both traits, the estimation of fixed effects (best linear unbiased estimator -BLUE) and prediction of random effects (best linear unbiased prediction -BLUP) was performed via the solution of the Henderson equations system (Henderson 1984). Variance components were estimated by the restricted maximum likelihood (REML) method (Patterson and Thompson 1971). The lmer function of the lme4 R package was used to fit the mixed model (Bates et al. 2015). The significance of variance components was evaluated by the likelihood ratio test considering a significance level of 5% using the ranova function of the lmerTest R package (Kuznetsova et al. 2017).
Accuracy on a genotype basis was estimated using the following estimator (Henderson 1984): r̂ĝg = (1 -PEV σ̂2 g ) 1/2 , where PEV is the average of prediction error variance associated with the BLUPs of the genotypes. Accuracy was interpreted as a measure associated with precision in selection according to Resende and Duarte (2007). In addition, the coefficient of experimental variation (CV) was estimated for both traits.
The effectiveness of indirect selection for biomass production based on visual selection for vigor scores was assessed by estimation of the following criteria regarding the variable number of graders: the Spearman correlation between the BLUPs of genotypes for vigor score and biomass production, and relative genetic gain for biomass production regarding the top 5% of selected genotypes based on the vigor score.
The relative genetic gain (%) was estimated according to the estimator RG = (BLUP y/y , BLUP y )/×100, where BLUP y/y is the indirect genetic gain for biomass production computed by the average of BLUPs for biomass production (y) of the top 5% of selected genotypes ranked as based on BLUPs for vigor score (y') regarding different numbers of visual graders, and BLUP y is the direct genetic gain for computed by the average of BLUPs for biomass production (y) of the top 5% of high-yielding genotypes.
Parameter estimates were visualized graphically using boxplots created in the stat R package (R Core Team 2019).

RESULTS
Genetic variance for biomass production and vigor score was significant at a nominal significance level of 5% (Table  1). The variance component associated with the grader effect was also significant regarding the average of the vigor score from five graders ( Table 1). The accuracy on a genotype basis was 0.64 for biomass production, while for vigor scores, the values ranged from 0.56 to 0.74 among graders. For the average score from five graders, the accuracy was higher (0.96). The accuracies observed indicate high selection reliability for both traits. Lower CV estimates for vigor scores (13-21%) compared to those for biomass production (30%) were also observed (Table 1).

JMO Fonseca et al.
Estimates of genetic variance (Figure 1), accuracy (Figure 2), and CV ( Figure 3) varied according to the number of graders. The variation was substantial when fewer graders were used. The average of the estimates stabilized after three or more graders. Similar results were found for the correlations between the BLUPs for biomass and vigor scores (Figure 4). Correlation averages across the grader combinations ranged from 77% to 86% when one and five graders, respectively, were considered.
Comparisons among indirect selections based upon the different number of graders showed significant differences ( Figure 5). The expected direct gain for biomass production was 37% when the top 5% of high-yielding genotypes was selected. The relative genetic gain for biomass production due to indirect selection of the top 5% of selected genotypes based on the visual scores was expressive (68 -77%) and higher when the number of graders increased ( Figure 5).

DISCUSSION
The effectiveness of the vigor score for selection was investigated under various numbers of graders by estimating the correlation between the biomass and vigor scores of genotypic BLUPs and relative genetic gain. Results from this experiment showed a significant increase in these parameters when increasing the number of graders. Indirect genetic selection for biomass production based on vigor scores from three or more graders was able to recovery at least 74% of the direct gain. This result indicates that scores might be feasible to indirectly select high-yielding genotypes (Figures 4 and 5). Analyzing the boxplots where a single grader is scoring all genotypes shows that most of the genotypes are misclassified at a 5% selection intensity. Hence, decisions based on a single grader could impair the progress of a forage breeding program.   When three or more graders assessed all genotypes, the number of true-positive genotypes (i.e., plants that look good and have better yield) increased, while the falsenegative rate remained stable. However, indirect selection can also be used to eliminate poor and non-adapted genotypes. In this case, even though some promising genotypes could be eliminated (type II error), most of the genotypes expressing unsatisfactory performance are correctly classified. Hence, forage breeding programs could benefit from allocating resources to fewer genotypes in the following generations. Nevertheless, in both cases (selecting versus discarding), it is important to consider that perennial grass breeding programs have a peculiar regrowth evaluation (Berchembrock et al. 2020), which limits the number of generations and selection cycles on a per year basis. Various forage breeding programs have used visual evaluation to assess traits at different stages of selection cycles for crops such as red clover (Riday 2009), U. ruziziensis (Souza Sobrinho et al. 2010, Teixeira et al. 2020, U. decumbens (Matias et al. 2020), U. humidicola (Figueiredo et al. 2019), and Urochloa spp. . Teixeira et al. (2020) showed the effectiveness of visual selection for green biomass yield based on plant vigor in U. ruziziensis. Souza Sobrinho et al. (2010) also highlighted the efficiency of visual evaluation in U. ruziziensis by indirectly selecting genotypes resistant to spittlebugs. That study described the evaluation of U. ruziziensis genotypes in the early stage of a selection cycle at the Embrapa breeding program. That stage is characterized by a large number of genotypes expressing high variability in biomass production, revealing that the use of scores is effective through its practicality in phenotyping and savings in resources. Moreover, the visual vigor score has high heritability, achieving values greater than 0.88 when three or more graders are used, which enables its use in a U. ruziziensis breeding program.
Another aspect related to the evaluation of genotypes in the early stage of a selection cycle is the use of the augmented block design (ABD). This design is recommended at such stages due to its flexibility and especially since it allows testing of non-replicated genotypes. However, its estimates of variance components are generally less accurate than estimates from other designs, such as the randomized complete block design -RCBD (Santos et al. 2002). Figueiredo et al. (2012) found higher accuracy values when analyzing the regrowth score in Urochloa humidicola in a RCBD. The fact that estimates from the ABD are less accurate might partially explain the low magnitudes of genetic variance estimates for biomass production and vigor scores (Table 1). Even so, it is noteworthy that at this early stage, the differences to be detected are higher, so the use of ABD is feasible. Moreover, the ABD allows evaluation of more genotypes and application of high selection intensity, which result in higher genetic gain (Silva Filho 2013).

JMO Fonseca et al.
Several factors might affect the effectiveness of visual evaluation. Results of this study showed that the number of graders affects genetic variance, accuracy, and CV estimates, especially when a single grader is considered. In addition, the use of three or more graders to score genotypes proved to be more reliable. Furthermore, it should be highlighted that the experience of the grader, the difficulty of scoring some traits, and the time consumption associated with visual resting might also affect the effectiveness of visual selection. The breeder should also consider new tools for indirect selection of high-yielding U. ruziziensis genotypes, such as the use of high-throughput phenotyping (HTP) based on remote sensing (Yang et al 2017, Maes andSteppe 2019). However, the practical use of HTP in experiments of U. ruziziensis requires more studies to validate its protocol in the routine practices of the forage breeding program.
Finally, our results suggested that the optimal number of graders might be three because there are no substantial changes in estimates beyond that number, dispensing with the inclusion of more graders. However, it should be noted that these results are limited to the current study. More studies considering U. ruziziensis under different conditions are needed, such as more cuts, different seasons (rainy or dry seasons), and different sowing dates. The concept of an optimal number of graders then becomes more precise.

CONCLUSION
The number of graders affects the effectiveness of visual evaluation for biomass production in U. ruziziensis. For optimization purposes, the use of three graders is recommended.