Dynamics, diversity and experimental precision in final irrigated rice testing: a time meta-analysis

The objective of this study was to investigate diversity, experimental conditions and dynamics of the Irrigated Rice Breeding Program of Minas Gerais – Brazil, by meta-analysis. The target traits were grain yield, plant height and days to flowering. Evaluations were based on estimates of 376 genotypes grown at three locations, in two final comparative trials, in 14 growing seasons. Stability of the overall averages of the traits plant height and days to flowering was observed, indicating an adequate plant height and medium cycle. High average yields (>5,000 kg ha-1), good experimental accuracy and genetic variability were recorded. However, the genetic variability of all traits decreased over time, indicating the need to increase genotypic diversity. The parameter estimates of the morphoagronomic traits studied in time meta-analysis indicated the dynamic nature and good accuracy of the Irrigated Rice Breeding Program of Minas Gerais.


INTRODUCTION
Rice (Oryza sativa L.) is the staple food of more than half of the human population. Reportedly, global rice consumption increased by 8.8% in the past decade. However, to meet the rising food demand of the growing world population, rice production will have to be increased by more than 30% over the next 30 years in the same cultivation area (Rejesus et al. 2012, Mieulet et al. 2016, Sandhu et al. 2018).
The results of agricultural experiments with rice or any other crop, compiled in databases, can be analyzed together to understand the breadth of information and the data dynamics in a temporal evaluation. The dynamics of plant breeding programs have been analyzed based on the genotype replacement rate over time (Reis et al. 2015, Follmann et al. 2017, which is calculated from the inclusion, exclusion and renewal of cultivars and is considered the most efficient way to assess the dynamics of performance (Federizzi et al. 2012, Ceccarelli 2015. However, this form of evaluation does not directly represent the pillars of genetic breeding, e.g., the diversity of plant material or accuracy of experimental performance, among other parameters. Estimates of genetic parameters are WG Costa et al. essential for inference of the predicted gains from selection and guide different selection strategies that allow breeders to make most efficient use of these parameters (Otoboni et al. 2020). To be able to draw consistent conclusions, all aspects of the entire experiments must be evaluated. Nevertheless, some factors within an experiment may cause greater variability between experiments, which can affect the results (e.g., weather, locations, pests and diseases, sampling errors). In this case, data recorded under different conditions must be combined to enable more accurate and precise conclusions than can be drawn from a single source of information. An effective and accurate form of analyzing this accumulated information is by meta-analysis.
This kind of analysis is a statistical approach to combine multiple, independent studies, carried out under experimental conditions that differ in terms of growing season, location, soil texture, timeframe and other aspects, to assess the consistency of results (Fisher 2015, Toler et al. 2019. Meta-analyses are extensively used, exploiting information compiled in systematic reviews, particularly in the field of medicine. Yet, in plant breeding programs, meta-analyses can also be used for combined analyses of specific and restricted experimental data, since, according to Eloy et al. (2018), they can enhance the precision of evaluations of treatment effects and help consolidate isolated results. In this context, metaanalyses are based on summarizing specific information from experimental analyses in a single measure. The combined estimate is also called meta-analytic estimate and such combined analyses based on long-term experimentation are named time meta-analysis.
Meta-analysis is commonly used in animal breeding programs (Azevêdo et al. 2010, Del Claro et al. 2011, Vieira et al. 2013, Eloy et al. 2018 to increase their efficiency. However, for plant breeding, meta-analysis is not yet as widely applied. For rice studies in Brazil, meta-analysis has been used for upland (Breseghello et al. 2011) and for irrigated rice (Streck et al. 2018) in the central north and southern regions of Brazil, respectively. To fill this gap, this study evaluated the experimental conditions, diversity and dynamics of the Irrigated Rice Breeding Program of Minas Gerais, from 2004 to 2018, based on meta-analytic estimates of parameters related to experimental accuracy, genetic variability and averages of morpho-agronomic rice traits.

Data collection
The evaluated data were the result of the last two comparative trials of the irrigated rice breeding program in Minas Gerais, Brazil, i.e., the Preliminary Comparative Trials (PCT) and Advanced Comparative Trials (ACT) or trials for the Value for Cultivation and Use (VCU), carried out by the Agricultural Research Company of Minas Gerais (EPAMIG). The PCAs were carried out at the Experimental Station Leopoldina (CELP) in Leopoldina, MG (lat 21° 31' 48.01" S, long 42° 38' 24.00" W, alt 257.19 m asl) and at the Experimental Station Lambari (CELB) in Lambari, MG (lat 21° 58' 11.24" S, long 45° 20.60' 59" W; alt 887.55 m asl).
The VCU trials were carried out at three locations: Leopoldina, Lambari and Nova Porteirinha, MG, the latter at the Experimental Station Gorotuba (CEGR) (lat 15° 48' 0.77" S, long 43° 17' 59.09" W, alt 533.77 m asl). The trials were set up between the spring and summer growing seasons (between October and March), with 25 entries (lines and cultivars) in the VCUs and 36 in the PCTs, except in 2013/14 and 2017/18, when 30 entries were tested in the PCTs. For the statistical analyses, a triple lattice design was used for PCTs and of randomized blocks for VCUs with three replications. The PCT plots consisted of four 5-m rows, spaced 0.3 m apart, and the evaluated area of the middle 4 m of two center rows, to exclude any border effects. In the VCUs, from 2004 to 2007 the experimental plots consisted of six rows and from 2008 onwards of five rows, 5 m long, spaced 0.3 m apart, and the assessed area was given by the middle 4 m of three center rows, to ensure the absence of border effects.
The trials were planted on floodplain soils under continuous flood irrigation. The agronomic management was applied according to the typical agronomic recommendations for these regions (EMBRAPA 1977). Evaluations were performed for grain yield (kg ha -1 ) adjusted to 13% moisture, plant height at maturity by measuring the main stem length (cm) and days to flowering, i.e., the number of days from emergence to 50% flowering. The experimental plots were harvested manually.

Statistical analyses
Data were subjected to individual analysis of variance (ANOVA), in analyses of each trial of each location in each year, to determine the average parameters (μ), heritability (h 2 ) and coefficient of variation (CV) specifically for each trait. Subsequently, an exploratory analysis of each parameter set was performed to analyze the distribution, summary of the main data set statistics and the presence of outliers by the box-plot graph (Bussab and Morettin 2017). For the analyses, the data were grouped according to the locations. The datasets were also analyzed for possible study effects, to check for variability between studies (locations), as recommended by Koricheva and Gurevitch (2014) in a long-term systematic review.

Meta-analysis
Simple, fixed and mixed regression models were used to identify the most appropriate for meta-analytic estimation. The Akaike (AIC) and Bayesian (BIC) information criteria and the log-likelihood ratio (logLik) were used to compare the regression models in meta-analysis, as well as the likelihood ratio test to check the differences between regression models, as recommended by Viechtbauer (2010). For all analyses, software Genes (Cruz 2016) and R (R Core Team 2020) were used.

Simple regression model
The simple regression model was used to demonstrate the presence of variability between studies (locations), since this model does not take this variability into account. Thus, the simple regression model was given by: where Y i is the plot-level data; β 0 is the intercept; β 1 is the slope (of the year-centered mean); X i is the parameter value in experiment i; and e i is the residual error, e i ~N(0, σ 2 ).
The intercept values (β 0 ) reflect the meta-analytic estimates of the parameter over the years of assessment and the slope value (β 1 ) (weighting coefficient of the variable year) indicates the magnitude and direction of the variation of a parameter, and thus the gain for the characteristic. The genetic gain was calculated as the slope/intercept ratio, as proposed by Breseghello et al. (2011).

Fixed regression model: fixed effect + effect between studies
In the fixed-effects model, it was assumed that the various studies contributing to the problem of interest share the same true effect size, so that the estimated parameter is a common effect size for all studies (Sangnawakij et al. 2019). In this context, the effect caused by the data derived from different studies was taken into account as homogeneous in the fixed-effects model. Thus, the fixed-effects model was given by: where Y ij is the plot-level data; β 0 is the intercept; S i is the fixed effect of group i; β 1 is the slope (of year-centered mean); X ij is the parameter value in experiment j of study i; β i is the fixed effect of study i on the slope; and e i is the residual error, e i ~N(0, σ 2 ).

Mixed regression model: fixed effect + random effect + effect between studies
In contrast to the fixed regression model, the random-effects model treats the true effect size of the studies involved in the meta-analysis as a random sample of the population effect size distribution so that each group can contribute with a different effect (Sangnawakij et al. 2019). In this context, the mixed regression model, similar to the fixed-effects model, takes into account the effect factor between groups (studies). Moreover, it also takes into account other effects than those related to the fixed part of the model, called random effects, caused by the heterogeneity between studies (locations). In this model, the meta-analytic estimates were obtained by the Maximum Restricted Likelihood (REML) method. The mixed regression model was described by: where Y ij is the plot-level data; β 0 is the intercept; β 1 is the slope (of year-centered mean); X ij is the parameter value in experiment j of study i; s * i is the random effect of study i; b * i is the random effect of study i on the slope; e i is the residual

RESULTS AND DISCUSSION
Univariate analysis of variance of the traits indicated significant differences (by the F test at 5% probability) for the traits in most of the experiments. This indicates the presence of genetic variability among the different evaluated entries, justifying the use of meta-analysis to establish combined parameter estimates.
As suggested by Bussab and Morettin (2017), prior to the experimental analysis, an exploratory analysis of the data was performed with removal of all outliers, and the distribution and statistics of the data set were analyzed. Only for the average parameter estimates (μ) for grain yield, plant height and grain yield heritability (h 2 ) no outliers were found ( Figure 1A). The outliers were removed since they lead to larger errors in the residues and consequently increase the standard deviation of the intercept (β 0 ) in regression analyses (Viechtbauer 2010). The presence of outliers indicates errors that weaken trials due to the inherent variability in plant material and/or data collection problems and ultimately, outliers can mask important results and influence the final inferences (Freitas et al. 2008). High amplitudes of variation for the parameter estimates were observed, especially for those based on datasets including outliers ( Figure 1A). This shows the great variation between experiments, requiring the combination of specific information from these experiments to establish more accurate parameter estimates. After outlier removal, the respective amplitudes of the estimates decreased considerably ( Figure 1B), eliminating possible biases. Interestingly, the exclusion of outliers for h 2 of days to flowering decreased the amplitude of variation by approximately 80% (Figure  1). Without the outliers, the database of the breeding program proved representative of the different rice cultivation conditions, due to the variation and magnitudes of the estimates (Breseghello et al. 2011, Morais Júnior et al. 2015, Streck et al. 2018).
Based on the time meta-analysis, meta-analytical estimates of parameters indispensable for breeding were established, which reflect the diversity of the studied plant material, the experimental precision and the dynamics of the Irrigated Rice Breeding Program of Minas Gerais from 2004 to 2017.

Simple regression model
The results of the simple regression analysis (Table 1) showed that the estimated intercept (β 0 ) corresponded to the respective arithmetic means of the parameter estimates. Likewise, the estimated slope values (β 1 ) were given by the respective arithmetic means of slopes between locations.

Fixed regression model
In the fixed regression model, the signal of the estimated slope (Table 1) for the CV of grain yield (0.04) and of μ for days to flowering (0.06) was opposite to that obtained by the simple regression model (-0.0007 and -0.05, respectively). Although the magnitude of these values does not differ between the models on a larger scale, the results from the simple model could be misinterpreted, due to the direction of the slope value. According to Toler et al. (2019), changes in the study effect could potentially result from publication bias, changes in the methodology or from real biological changes.

WG Costa et al.
For grain yield however, in view of the magnitudes of the intercepts and biological interpretation of the parameters, the simple regression model allowed the conclusion of a gain of 34 kg year -1 , while the gain by the fixed regression model would be 4.22 kg year -1 . In other words, the simple model indicated an eight times greater average yield gain per year than the fixed regression model. This conclusion could have a direct impact on the objectives of the program, i.e., the decision-making process and conclusions about the effectiveness of the breeding program could be affected. In addition, the standard deviations for both intercept and slope of most parameters of the fixed were lower than of the simple model (Table 1). The implication of these results justifies the choice of a model that takes into account the variability among locations to ensure correct analysis results.

Mixed regression model
The analysis of the mixed regression model (Table 1) showed that the intercept and slope values of the parameter estimates do not differ largely from those obtained by the fixed regression model. The intercept and slope standard deviations of the parameters for the mixed were higher than for the fixed models ( Table 1). The standard deviation for the random effects estimate in the mixed model is wider than the standard deviation for the fixed effect estimate in the fixed model (Schwarzer et al. 2015). This can be explained by the estimation method of the mixed model that considers the existence of heterogeneity among locations.

Comparison of fitting between models
The fixed regression model proved advantageous, since the h 2 and μ estimates for the three traits and CV of grain yield had lower AIC and BIC criteria and higher logLik values (Table 2). With regard to the CV estimates for plant height and flowering days, the likelihood-ratio test showed no significant difference (p > 0.05) between the models (Table 2). In these cases, the fixed-effects model was also preferable, due to the higher coefficient of determination (R 2 ) ( Table  2). Therefore, the fixed regression model was the most adequate to predict all parameter estimates for all three traits over the years.
In other studies, model comparisons were made to improve model fitting for the prediction of meta-analytic estimates of the parameters (Azevêdo et al. 2010, Vieira et al. 2013 or to evaluate animal performance by meta-analysis (Eloy et al. 2018). Adopting the AIC and BIC criteria, these authors concluded that the best option was the mixed model (random effect). Compared to this study, those results can be explained by the greater number of studies (groups) included, which increased the heterogeneity among them.
In general, the results of our study showed that after outlier exclusion, the magnitudes of the intercepts and slopes of the parameter estimates were similar by any regression model. However, the fixed and mixed models are better predictors of meta-analytic estimates, since these models also take into account the variability among locations. In addition, as the fixed effects model was advantageous, this variability is small, i.e. the variance among locations can be considered homogeneous.
A meta-analysis of a historical series must be used when specific information on the program management style and transitions in the breeding process are expected. This kind of critical analysis of the program efficiency would obtain more reliable results to help draft new actions and strategies for the development and release of new cultivars (Streck et al. 2018). Therefore, the meta-analysis technique proved most appropriate to evaluate long-term parameters in breeding programs, for not only taking into account the average of the parameters, but also their variation in the groups over the course of time.
According to the time meta-analysis, the Irrigated Rice Breeding Program of Minas Gerais has been exploiting good genetic variability in relation to total variability (heritability) for grain yield (h 2 = 53.74%) and high variability for plant height (h 2 = 74.17%) and days to flowering (h 2 = 91.62%). These results show that the progenies of high-performing parents will also tend to have a high performance (Borem et al. 2017), indicating a greater chance of success with selection in the breeding program (Vasconcelos et al. 2012). However, this heritability value decreased over time ( Figure  2), suggesting which the narrowing of the genetic base of rice mentioned for Busanello et al. (2020) in southern Brazil also was confirmed in the Minas Gerais, requiring an effort to increase parental diversity. Aside from the search for alternatives to increase genetic variability, superior cultivars must be crossed to form a base population, as recommended by Santos et al. (2019).
The experimental accuracy of the program proved adequate for grain yield (CV = 15.68%) and excellent for the other traits, according to the coefficients of variation. Moreover, it was stable (Figure 2), demonstrating the quality of the experimental installation and performance, allowing the expression of the variability and ensuring good data accuracy.
The annual gain for grain yield of the evaluated genotypes was 4.22 kg year -1 (Table 2), indicating the contribution of the selected genotypes to a higher rice yield. The genetic progress for grain yield evidenced by time meta-analysis and location means from 2004 to 2017 is shown in Figure 2. The variation in the averages according to the evaluated environment and year was wide and the annual genetic progress for grain yield was 0.08%. Thus, it is necessary to reinforce the need to establish new breeding strategies to develop superior cultivars to the already released. However, in the studied period, the intensification of selection for grain quality attributes decreased the gain for grain yield (Streck et al. 2018). It noteworthy that, breeding programs seek to obtain genotypes with high productivity (Woyann et al. 2019) and despite this magnitude of genetic gain, the average grain yield (5,058.10 kg ha -1 ) produced by the breeding program was almost double the mean rice grain yield recorded in Minas Gerais in the 2018/2019 growing season (CONAB 2019).

WG Costa et al.
The stable averages of plant height and days to flowering show that the program has contributed to develop genotypes with adequate plant height and medium cycle in the period from 2004 to 2017, which is critical for a better crop establishment and competition with weeds (Borem and Nakano 2015). In this period (2004 -2017), compared with irrigated rice of the southern region (Streck et al. 2018) and upland rice in the central-north region (Breseghello et al. 2011), the genetic gain of the varieties of Minas Gerais was lower for grain yield, whereas the plants were shorter and the cycle later.
The Irrigated Rice Breeding Program of Minas Gerais carried out by the partner institutions EPAMIG, Embrapa Rice and Beans and UFLA was efficient in the analyzed period (2004 to 2017). Nine rice cultivars were released, five of which destined for cultivation on irrigated floodplains and four for rice in upland areas (irrigated by rain or sprinkler irrigation) (Soares et al. 2018).