Acessibilidade / Reportar erro

Meta-analysis of the experimental coefficient of variation in wheat using the Bayesian and Frequentist approaches

ABSTRACT

A large set of variables is assessed for progeny selection in a plant-breeding program and other agronomic fields. The meta-analysis of the coefficient of variation (CVe) produces information for researchers and breeders on the experimental quality of trials. This analysis can also be applied in the decision-making process of the experimental plan regarding the experimental design, the number of repetitions, and the treatments and plants/progenies to be measured. In this study, we evaluated the dataset distribution and the descriptive statistics of CVe through the Frequentist and Bayesian approaches, aiming to establish the credibility and confidence intervals. We submitted CVe data of ten wheat (Triticum aestivum L.) traits reported in 1,068 articles published to the Bayesian and Frequentist analyses. Sample data were analyzed via Gamma and normal models. We selected the model with the lowest Akaike Information Criterion (AIC) value, and then we tested three link functions. In the Bayesian analysis, uniform distributions were used as non-informative priors for the Gamma distribution parameters with three ranges of q~U (a,b,). Thus, the prior probability density function was given by: pθ =1β-α, θ α,β. The Bayesian and Frequentist approaches with the Gamma model presented similar results for CVe; however, the range Bayesian credible intervals was narrower than the Frequentist confidence intervals. Gamma distribution fitted the CVe data better than the normal distribution. The credible and confidence intervals of CVe were successfully applied to wheat traits and could be used as experimental accuracy measurements in other experiments.

Triticum aestivum L; CVe probability distribution; Gamma distribution; credible and confidence interval

Introduction

In many countries, governmental authorities require the evaluation of a new cultivar in officially registered trials before its release. The estimated coefficient of variation (CVe) is usually essential for this process and has been used as a parameter of experimental quality (Piepho and Möhring, 2006Piepho, H-P.; Möhring, J. 2006. Selection in cultivar trials: is it ignorable? Crop Science, 46: 192–201. https://doi.org/10.2135/cropsci2005.04-0038
https://doi.org/10.2135/cropsci2005.04-0...
). The CVe must present adequate levels depending on the species and traits available (Resende and Duarte, 2007Resende, M.D.V.; Duarte, J.B. 2007. Precision and quality control in variety trials. Pesquisa Agropecuária Tropical 37: 182-194 (in Portuguese, with abstract in English). https://doi.org/10.5216/pat.v37i3.1867
https://doi.org/10.5216/pat.v37i3.1867...
; Arnhold and Milani, 2011Arnhold, E.; Milani, K.F. 2011. Rank-ordering coefficients of variation for popping expansion. Acta Scientiarum. Agronomy 33: 527-531. https://doi.org/10.4025/actasciagron.v33i3.11911
https://doi.org/10.4025/actasciagron.v33...
). This measurement considers the experimental estimate error (σ^e2 is the relation mean overall of the experiment and can be easily obtained as

C V e = σ ^ e 2 μ ^ × 100

Classifications for the CVe magnitude have been proposed for several crops (Albert and Zhang, 2010Albert, A.; Zhang, L. 2010. A novel definition of the multivariate coefficient of variation. Biometrical Journal 52: 667–675. https://doi.org/10.1002/bimj.201000030
https://doi.org/10.1002/bimj.201000030...
; Fritsche-Neto et al., 2012Fritsche-Neto, R.; Vieira, R.A.; Scapim, C.A.; Miranda, G.V.; Rezende, L.M. 2012. Updating the ranking of the coefficients of variation from maize experiments. Acta Scientiarum. Agronomy 34: 99-101. https://doi.org/10.4025/actasciagron.v34i1.13115
https://doi.org/10.4025/actasciagron.v34...
; Couto et al., 2013Couto, M.F.; Peternelli, L.A.; Barbosa, M.H.P. 2013. Classification of the coefficients of variation for sugarcane crops. Ciência Rural 43: 957-961. https://doi.org/10.1590/S0103-84782013000600003
https://doi.org/10.1590/S0103-8478201300...
; Aerts et al., 2015Aerts, S.; Haesbroeck, G.; Ruwet, C. 2015. Multivariate coefficients of variation: comparison and influence functions. Journal of Multivariate Analysis 142: 183–198. https://doi.org/10.1016/j.jmva.2015.08.006
https://doi.org/10.1016/j.jmva.2015.08.0...
). The Frequentist approach has been used to estimate the CVe (Fritsche-Neto et., 2012; Mora and Arriagada, 2016Mora, F.; Arriagada, O. 2016. A classification proposal for coefficients of variation in Eucalyptus experiments involving survival, growth and wood quality variables. Bragantia 75: 263-267. https://doi.org/10.1590/1678-4499.458
https://doi.org/10.1590/1678-4499.458...
; Nardino et al., 2020Nardino, M.; Pereira, J.M.; Marques, V.T.; D’Avila, F.C.; Franco, F.D.; Barros, W.S. 2020. Coefficient of variation: a new approach for the study in maize experiments. Revista Brasileira de Biometria 38: 185–206. https://doi.org/10.28951/rbb.v38i2.440
https://doi.org/10.28951/rbb.v38i2.440...
); nevertheless, few proposals take into account the CVe distribution for a trait that follows a non-normal distribution, that is, robust methods are necessary, since the distribution is unknown in many cases.

The Bayesian inference can be very useful to evaluate the CVe classification, since it allows estimating parameters and relating measurements of association in non-normally distributed data or where asymptotic assumptions are not appropriate, due to sparse data or small sample sizes. However, no studies were found in the literature for wheat crop (Triticum aestivum L.) that use the Bayesian approach for the CVe classification. The advantages of the Bayesian approach are mainly related to the independence of normally distributed data, considering that space parametric CVe (%) is > 0 and the normal distribution is – ∞ + ∞. Moreover, this method offers flexibility to choose the dataset distribution and to incorporate prior knowledge on model parameters (Silva et al., 2013Silva, F.F.; Viana, J.M.S.; Faria, V.R.; Resende, M.D.V. 2013. Bayesian inference of mixed models in quantitative genetics of crop species. Theoretical and Applied Genetics 126: 1749–1761. https://doi.org/10.1007/s00122-013-2089-6
https://doi.org/10.1007/s00122-013-2089-...
).

In the literature, no studies on cross-information with the Frequentist and Bayesian approaches were observed on dataset distributions or descriptive statistics of the CVe in wheat. Therefore, we searched in the leading Brazilian journals for the CVe values in wheat traits via the meta-analysis. The variables studied are relevant for breeding programs to select and estimate genetic gain, as well as in the plant science field for studies on cultivar characterization. Research on experimental quality in these and other fields via different statistical approaches is scientifically relevant. Here, we evaluated the distribution of the CVe and descriptive statistics through the Frequentist and Bayesian approaches to establish credibility and confidence intervals for ten wheat traits.

Materials and Methods

Data source

We researched 1,068 articles on wheat published between 1970 and 2020 in all editions of the most renowned Brazilian scientific journals (Table 1). These data strongly support statistical tests to establish criteria of CVe classification for the most evaluated traits in wheat. We collected experimental CVe values from all journals listed in Table 1 and accessed all articles on the journal’s online page. The following search terms were used: Triticum aestivum L., wheat, trigo, coefficient of variation, CV %, Triticum.

Table 1
Database of the experimental coefficient of variation (CVe) used in the Bayesian and Frequentist analyses.

The CVe data were collected for the following traits: grain yield (GY, n = 990); days for flowering (DF, n = 98); grain yield per plant (GYP, n = 64); hundred-grain weight (HGW, n = 63); hectoliter weight (HW, n = 163); spike length (LS, n = 52); number of grains per spike (NGS, n = 115); number of spikelets per spike (NSPS, n = 76); plant height (PH, n = 209); and thousand grain weight (TGW, n = 142). We calculated the standard error of the mean (SEM) for the ten traits as the estimators as follows: SEM = σn and the inverse of square SEM was obtained: 1SEM2 . Estimated values are presented in Table 1.

Database reviews

The initial analysis for data inspection revealed that 90.7 % was obtained from experiments arranged in a randomized complete block design (RCBD), 6.17 % was obtained from completely randomized design (CRD), 2.53 % was obtained from an experiment conducted in lattice design (DLAT), and 0.6 %, in the design of tracks.

The data were tabulated in an MS Excel spreadsheet containing the categorical traits of the journal, publication year, number of treatments, number of replications, and the experimental CVe values of traits, prior to preparation and organization for the statistical analyses.

Statistical analyses

Frequentist statistics

Model selection

The goodness-of-fit of the models to the data was tested by the Akaike information criterion (AIC), as follows:

A I C = 2 L o g L + 2 p

where: p is the number of parameters, and LogL is the logarithm of the maximum value of the likelihood function. The best model has the smallest AIC or less information loss (Casella and Berger, 2002Casella, G.; Berger, R. 2002. Statistical Inference. 2ed. Thomson Learning, Duxbury, CA, USA.; Cavanaugh and Neath, 2019Cavanaugh, J.E.; Neath, A.A. 2019. The Akaike information criterion: background, derivation, properties, application, interpretation, and refinements. Wiley Interdisciplinary Reviews: Computational Statistics 11: e1460. https://doi.org/10.1002/wics.1460
https://doi.org/10.1002/wics.1460...
). We also tested three link functions in the Gamma model that used identity, log and inverse, and AIC to select the best function link.

Initially, the CVe data of each variable were used to obtain data distributions and projections of the normal and Gamma distribution. Then, a generalized linear model (GLM) was fitted (intercept only), assuming a Gamma distribution. We tested three link functions, using the GLM function of the R software system, as follows:

glm (trait1, family = Gamma(link = identity))

glm (trait1, family = Gamma(link = inverse))

glm (trait1, family = Gamma(link = log))

We computed the lower confidence interval (LCi) and estimated mean and upper confidence interval (UCi).

The statistical analysis was carried out in the R software system (R Core Team, version 4.0.2), using the metan (Olivoto and Lúcio, 2020Olivoto, T.; Lúcio, A.D. 2020. Metan: an R package for multi-environment trial analysis. Methods in Ecology and Evolution 11: 783–789. https://doi.org/10.1111/2041-210X.13384
https://doi.org/10.1111/2041-210X.13384...
) and ggplot2 packages (Wickham, 2016Wickham, H. 2016. ggplot2 - Positioning Elegant Graphics for Data Analysis. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-319-24277-4
https://doi.org/10.1007/978-3-319-24277-...
). The scripts that were used to carry out the analysis are given in Appendix I Appendix Appendix I – R Program used for obtained of results library(tidyverse) library(metan) library(rio) library(ggrepel) library(R2OpenBUGS) library(boa) library(ggformula) The data and code can be obtained at: https://tiagoolivoto.github.io/paper_coefvar/code.html#2_Data 3 Bayesian 3.1 Function data_cv <- import("http://bit.ly/data_cvs") %>% select(GY:NGS) # Long format data_cv_long <- data_cv %>% pivot_longer(everything(), names_to = "var", values_to = "cv") %>% remove_rows_na() # samples per variable data_cv_long %>% n_by(var) # create a list of traits with no missing values df <- lapply(data_cv, remove_rows_na) bayes <- function(df){ linemodel <- function(){ for (i in 1:64) # change the number of samples for each variable { y[i] ~ dgamma(r, mu) } r ~ dunif(0,5) mu ~ dunif(0,5) } ################ Specification the data linedata <- list(y = df[[1]]) ###################### Specification initial values lineinits <- function(){list(r = 0.5, mu = 1) } #Specification the parameters parameters <- c("r","mu") ############# Execution function analysis with bugs package of R2OpenBUGS Niter <- 10000 Nburn <- 1000 Nthin <- 10 ################ results of descriptive statistics ############# modelo <- bugs(data = linedata, inits = lineinits, parameters.to.save = parameters, model.file = linemodel, n.chains = 1, n.iter = Niter, n.burnin = Nburn, n.thin = Nthin, debug = TRUE) return(modelo$sims.matrix[,1] / modelo$sims.matrix[,2]) } 3.2 Posterior distribution GY <- bayes(df$GY) GYP <- bayes(df$GYP) HGW <- bayes(df$HGW) TGW <- bayes(df$TGW) HW <- bayes(df$HW) DF <- bayes(df$DF) PH <- bayes(df$PH) LS <- bayes(df$LS) NSPS <- bayes(df$NSps) NGS <- bayes(df$NGS) 3.3 Credibility intervals and mean posterior for each trait posterior <- import(" http://bit.ly/data_posterior") conf_int_bayes <- sapply(posterior, function(x){ conf_int <- boa.hpd(x, 0.05) data.frame(LCI = conf_int[[1]], MEAN = mean(x), UCI = conf_int[[2]]) }) %>% t() 3.4 Marginal posterior density posterior_long <- posterior %>% pivot_longer(everything()) ggplot(posterior_long, aes(value))+ geom_density(fill = "red", alpha = 0.5, size = 0.1) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + facet_wrap(~ name, scales = "free_y", ncol = 5) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Density") ggsave("figs/fig1_posterior.jpg", dpi = 600, width = 25, height = 10, units = "cm") # An alternative plot ggplot(posterior_long, aes(value))+ geom_density(aes(fill = name), alpha = 0.5) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Density") ggsave("figs/fig1_posterior2.jpg", dpi = 600, width = 25, height = 10, units = "cm") 4 Frequentist 4.1 Confidence interval get_confint <- function(df, var){ if(is.grouped_df(df)){ results <- doo(df, get_confint, var = {{var}}) return(results) } values <- na.omit(df %>% select_cols({{var}}) %>% pull()) model <- glm(values ~ 1, family = Gamma(link = "identity")) conf <- confint(model) MEAN <- coef(model)[[1]] LCI <- conf[[1]] UCI <- conf[[2]] data.frame(LCI = LCI, MEAN = MEAN, UCI = UCI) } freq_lim <- data_cv_long %>% group_by(var) %>% get_confint(cv) p <- gf_density( ~ cv | var, data = data_cv_long, fill = "red", alpha = 0.5) %>% gf_fitdistr(linetype = 2) %>% gf_fitdistr(dist = "gamma", color = "blue") p + facet_wrap(~var, nrow = 2, scales = "free_y") + theme(panel.grid.minor = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + scale_y_continuous(expand = expansion(c(0, 0.05))) + labs(x = "Coeficient of variation (%)", y = "Density") ggsave("figs/fig2_density.jpg", dpi = 600, width = 25, height = 10, units = "cm") 5 Results 5.1 Confidence interval df_confint <- import("http://bit.ly/data_confint") ggplot(df_confint, aes(MEAN, fct_rev(VAR), color = APPROACH)) + geom_point(position = position_dodge(width = 0.7), size = 2) + geom_errorbarh(aes(xmin = LCI, xmax = UCI), position = position_dodge(width = 0.7), width = 0.3) + scale_x_continuous(breaks = seq(2, 19, by = 2), expand = c(0.15, 0.15)) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.title = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Variable") + geom_text(aes(label = round(LCI, 2), x = LCI), position = position_dodge(width = 0.7), hjust = 1.2, size = 2.5, show.legend = FALSE) + geom_text(aes(label = round(UCI, 2), x = UCI), position = position_dodge(width = 0.7), hjust = -0.3, size = 2.5, show.legend = FALSE) ggsave("figs/fig3_confidence.jpg", dpi = 600, width = 10, height = 12, units = "cm") Appendix II (A) Illustrations for the number of iterations of parameters Gamma distribution (r, mu and deviance) of the Bayesian analysis for the traits: days for the flowering (DF); grain yield (GY); grain yield per plant (GYP); hundred-grain weight (HGW); hectoliter weight (HW); and spike length (LS). Appendix II (B) Illustrations for the number of iterations of parameters Gamma distribution (r, mu and deviance) of Bayesian analysis for the traits: number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW). .

Bayesian approach

Uniform distributions were used as non-informative priors for the Gamma distribution parameters, where three ranges were evaluated: r ~ U(0 – 5) and mu ~ U(0 –5), r ~ U(0 – 10) and mu ~ U(0 – 10) and r ~ U(0 – 20) and mu ~ U(0 – 20) with q~U (a, b). The values of deviance information criterion (DIC) were computed. Thus, the prior probability density function was given by:pθ = 1β-α, θ α,β. The uniform distributions have been used in the Bayesian analysis for both conceptual and practical reasons (Gelman, 2006Gelman, A. 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1: 515–34. https://doi.org/10.1214/06-BA117A
https://doi.org/10.1214/06-BA117A...
). When assuming the Gamma distribution for the data and the uniform distribution for Gamma parameters, the posterior density function was given as p(q|y)~Gamma(q|a,b), with the following density function:pθ = βατα θα-1eβθ, θ > 0, where the mean (expectated) is equal to Eθ = αβ (Gelman et al., 2004Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. 2004. Bayesian Data Analysis. 2ed. CRC Press, Boca Raton, FL, USA.). Iterations of 10,000, with burn-in and thin given, respectively, by 1,000 and 10 iterations were used.

The following statistical criteria of the Bayesian approach were used to develop the CVe classification of wheat crops: quantile 2.5 (q2.5), 1st quartile (q25), lower credible interval (LCi), posterior mean (Mean), upper credible interval (UCi), median (Md), standard deviation of the posterior (sd), 3rd quartile (q75), and quantile 97.5 (q97.5).

The Highest Posterior Density interval (HPD) was used to obtain LCi and UCi with probability = 0.95. For the Bayesian analyses, we used boa package (Bayesian Output Analysis) (Smith, 2007Smith, B.J. 2007. boa: an R package for MCMC output convergence assessment and posterior inference. Journal of Statistical Software 21: 1–37. https://doi.org/10.18637/jss.v021.i11
https://doi.org/10.18637/jss.v021.i11...
). The OpenBUGS and package R R2OpenBUGS (Sturtz et al., 2005Sturtz, S.; Ligges, U.; Gelman, A. 2005. R2WinBUGS: a package for running WinBUGS from R. Journal of Statistical Software 12: 1–16. 1 https://doi.org/0.18637/jss.v012.i03
https://doi.org/0.18637/jss.v012.i03...
) were also used. The scripts for the Bayesian analyses are reported in Appendix I Appendix Appendix I – R Program used for obtained of results library(tidyverse) library(metan) library(rio) library(ggrepel) library(R2OpenBUGS) library(boa) library(ggformula) The data and code can be obtained at: https://tiagoolivoto.github.io/paper_coefvar/code.html#2_Data 3 Bayesian 3.1 Function data_cv <- import("http://bit.ly/data_cvs") %>% select(GY:NGS) # Long format data_cv_long <- data_cv %>% pivot_longer(everything(), names_to = "var", values_to = "cv") %>% remove_rows_na() # samples per variable data_cv_long %>% n_by(var) # create a list of traits with no missing values df <- lapply(data_cv, remove_rows_na) bayes <- function(df){ linemodel <- function(){ for (i in 1:64) # change the number of samples for each variable { y[i] ~ dgamma(r, mu) } r ~ dunif(0,5) mu ~ dunif(0,5) } ################ Specification the data linedata <- list(y = df[[1]]) ###################### Specification initial values lineinits <- function(){list(r = 0.5, mu = 1) } #Specification the parameters parameters <- c("r","mu") ############# Execution function analysis with bugs package of R2OpenBUGS Niter <- 10000 Nburn <- 1000 Nthin <- 10 ################ results of descriptive statistics ############# modelo <- bugs(data = linedata, inits = lineinits, parameters.to.save = parameters, model.file = linemodel, n.chains = 1, n.iter = Niter, n.burnin = Nburn, n.thin = Nthin, debug = TRUE) return(modelo$sims.matrix[,1] / modelo$sims.matrix[,2]) } 3.2 Posterior distribution GY <- bayes(df$GY) GYP <- bayes(df$GYP) HGW <- bayes(df$HGW) TGW <- bayes(df$TGW) HW <- bayes(df$HW) DF <- bayes(df$DF) PH <- bayes(df$PH) LS <- bayes(df$LS) NSPS <- bayes(df$NSps) NGS <- bayes(df$NGS) 3.3 Credibility intervals and mean posterior for each trait posterior <- import(" http://bit.ly/data_posterior") conf_int_bayes <- sapply(posterior, function(x){ conf_int <- boa.hpd(x, 0.05) data.frame(LCI = conf_int[[1]], MEAN = mean(x), UCI = conf_int[[2]]) }) %>% t() 3.4 Marginal posterior density posterior_long <- posterior %>% pivot_longer(everything()) ggplot(posterior_long, aes(value))+ geom_density(fill = "red", alpha = 0.5, size = 0.1) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + facet_wrap(~ name, scales = "free_y", ncol = 5) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Density") ggsave("figs/fig1_posterior.jpg", dpi = 600, width = 25, height = 10, units = "cm") # An alternative plot ggplot(posterior_long, aes(value))+ geom_density(aes(fill = name), alpha = 0.5) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Density") ggsave("figs/fig1_posterior2.jpg", dpi = 600, width = 25, height = 10, units = "cm") 4 Frequentist 4.1 Confidence interval get_confint <- function(df, var){ if(is.grouped_df(df)){ results <- doo(df, get_confint, var = {{var}}) return(results) } values <- na.omit(df %>% select_cols({{var}}) %>% pull()) model <- glm(values ~ 1, family = Gamma(link = "identity")) conf <- confint(model) MEAN <- coef(model)[[1]] LCI <- conf[[1]] UCI <- conf[[2]] data.frame(LCI = LCI, MEAN = MEAN, UCI = UCI) } freq_lim <- data_cv_long %>% group_by(var) %>% get_confint(cv) p <- gf_density( ~ cv | var, data = data_cv_long, fill = "red", alpha = 0.5) %>% gf_fitdistr(linetype = 2) %>% gf_fitdistr(dist = "gamma", color = "blue") p + facet_wrap(~var, nrow = 2, scales = "free_y") + theme(panel.grid.minor = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + scale_y_continuous(expand = expansion(c(0, 0.05))) + labs(x = "Coeficient of variation (%)", y = "Density") ggsave("figs/fig2_density.jpg", dpi = 600, width = 25, height = 10, units = "cm") 5 Results 5.1 Confidence interval df_confint <- import("http://bit.ly/data_confint") ggplot(df_confint, aes(MEAN, fct_rev(VAR), color = APPROACH)) + geom_point(position = position_dodge(width = 0.7), size = 2) + geom_errorbarh(aes(xmin = LCI, xmax = UCI), position = position_dodge(width = 0.7), width = 0.3) + scale_x_continuous(breaks = seq(2, 19, by = 2), expand = c(0.15, 0.15)) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.title = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Variable") + geom_text(aes(label = round(LCI, 2), x = LCI), position = position_dodge(width = 0.7), hjust = 1.2, size = 2.5, show.legend = FALSE) + geom_text(aes(label = round(UCI, 2), x = UCI), position = position_dodge(width = 0.7), hjust = -0.3, size = 2.5, show.legend = FALSE) ggsave("figs/fig3_confidence.jpg", dpi = 600, width = 10, height = 12, units = "cm") Appendix II (A) Illustrations for the number of iterations of parameters Gamma distribution (r, mu and deviance) of the Bayesian analysis for the traits: days for the flowering (DF); grain yield (GY); grain yield per plant (GYP); hundred-grain weight (HGW); hectoliter weight (HW); and spike length (LS). Appendix II (B) Illustrations for the number of iterations of parameters Gamma distribution (r, mu and deviance) of Bayesian analysis for the traits: number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW). .

Results

Convergence and model fit

We used 10,000 iterations for burning and we realized a cut of the first 1,000 iterations. The results for all 9,000 iterations of the parameters of deviance, r, and mu of Gamma distribution are demonstrated in the supplementary material (Appendix II A and II B), considering all the wheat traits evaluated in Table 2.

Table 2
Results of deviance information criterion (DIC) analysis with three ranges for uniform distributions in the r and mu prior parameters DIC 1: 0 – 5, DIC 2: 0 – 10 and DIC 3: 0 – 20 and Akaike Information Criterion (AIC) for the Gamma model fit and for the normal model fit.

We used three models with different ranges for the parameters r and mu of uniform distribution in the script in Table 2, with a range of 0 – 5 in DIC_1, 0 – 10 in DIC_2, and 0 – 20 in DIC_3. Thus, for each model, we generated the DIC of the traits. The values of DIC_1, DIC_2 and DIC_3 for GY were: 6,229, 6,230 and 6,230; for DF: 403.9, 405.8 and 405.8; for GYP: 6,229, 6,230 and 6,230; for HGW: 351, 352 and 352; for TGW: 726, 728 and 728; for HW: 631, 633 and 633; for PH: 1,086, 1,087 and 1,087; for LS: 246, 247 and 247; for NSPS: 352, 354 and 354; for NGS: 664, 664 and 664, respectively.

The results of the DIC analysis for three models, considering ten traits, revealed that model 1, with 0 – 5 range of uniform distribution, has lower DIC, indicating the model with the best fit from which we obtained the descriptive statistics and credible intervals.

We obtained the results of the posterior mean of three models and the credibility intervals (LCI and UCI, 95 %) (Figure 1). The smallest range between the credibility intervals was observed for Model 1 (r ~ U(0 – 5) and mu ~ U(0 – 5)). For the posterior mean, the most significant difference between the three models was found in the trait grain yield plant (GYP), 0.07 %.

Figure 1
Posterior mean and credible intervals (LCI and UCI) for ten traits of wheat considering uniform distributions non-informative priors with parameters: Model_1 r ~ U(0, 5) and mu ~ U(0, 5), Model_2: r ~ U(0, 10) and mu ~ U(0, 10) and Model_ 3: r ~ U(0, 20) and mu ~ U(0, 20), where: days for the flowering (DF), grain yield (GY), grain yield per plant (GYP), hundred-grain weight (HGW), hectoliter weight (HW), spike length (LS), number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW).

The Akaike Information Criterion (AIC) for the Gamma model of the Frequentist approach of the ten traits is shown in Table 2. The AIC for GY: 6,233.3, DF: 408, GYP: 433.4, HGW: 355, TGW: 731.1, HW: 636.7, PH: 1,090.1, LS: 249.4, NSPS: 356.4, and NGS: 666.4, respectively. In the Gamma model, three link functions were tested using the AIC criterion to select the model best link function. No differences were observed between the link functions used, namely identity, Log, and inverse. Thus, we selected the identity link function due to the easiness to interpret the parameters.

We used a normal model and a Gamma model in data of ten traits and obtained the AIC (Table 2). All AIC values were higher in the normal model, where the AIC for GY was: 6,462.8, DF: 448.6, GYP: 448.6, HGW: 377.84, TGW: 871.7, HW: 894.6, PH: 1398.6, LS: 259.3, NSPS: 375.4, and NGS: 671.9. The Gamma model presented lower AIC values; thus, it adjusted better to the CVe values than the normal model.

Comparison Bayesian/Frequentist approaches

The results of posterior Bayesian distribution for the ten wheat traits evaluated regarding CVe are presented in Figure 2A. The posterior distribution range is short. GYP showed a greater CVe range, between 10 % and 17.5 %, and a density above 0.4. The traits with shorter CVe range in the posterior distribution were DF, GY, and HW with values 3 – 4.5 %, 12 – 13.5 %, and 2 – 3.3 %, besides all traits with a density equal to or above 2. Figure 2B shows that the grouping of the traits in the same figure demonstrates that each variable has a shape of specific distribution or variability.

Figure 2
Posterior distribution of experimental CVe for ten traits in wheat individual (A) and jointly (B) where: days for the flowering (DF), grain yield (GY), grain yield per plant (GYP), hundred-grain weight (HGW), hectoliter weight (HW), spike length (LS), number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW).

The data results – salmon color, Gamma – blue and normal line – dotted line distribution – are presented in Figure 3 for the ten wheat traits. The traits revealed that the CVe data evaluated were better fitted with the Gamma distribution in relation to the normal distribution. We highlight the fit for the traits DF, TGW, HW and GY of the Gamma distribution also the different forms of the trait distribution for CVe.

Figure 3
Frequentist distribution of experimental CVe for ten wheat traits. The data distribution is in salmon color, the Gamma distribution is in blue line, and the normal distribution is in the dotted line, where: days for the flowering (DF), grain yield (GY), grain yield per plant (GYP), hundred-grain weight (HGW), hectoliter weight (HW), spike length (LS), number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW).

Table 3 shows the results for the descriptive statistics of CVe of the ten traits evaluated in wheat by the Bayesian approach. Initially, we considered describing the results for descriptive statistics: quantile 2.5, 1st quartile, posterior mean, 3rd quartile and quantile 97.5.

Table 3
Results of descriptive statistics of analysis CVe data published for ten traits of wheat. q25 = first quartile; LCi = lower credible interval; UCi = upper credible interval; Mean: posteriori mean ± standard deviation a posteriori, q75 = third quartile, and mean ± standard error.

The Bayesian approach obtained the following CVe values of the traits: days of flowering (DF, days) – 3.68 %; grain yield (GY, kg ha1) – 12.64 %; grain yield plant (GYP, g) 13.68 %; hundred-grain weight (HGW, g) 8.07 %; hectoliter weight (HW, gL1) 2.65 %, length of the spike (LS, cm was 6.63 %, number of grains of the spike (NGS, units) 11.02 %; the number of spikelets/spike (NSPS, units) 6.12 %; plant height (PH, cm) 6.05 %; and thousand-grain weight (TGW, g) 5.92 %. For the quantiles statistics, the Bayesian approach revealed a GYP with the highest value for q25 – 13.61 %, q75 –13.84 %, which could be associated to the assessments carried out at the individual plant level.

We observed lower CVe values for the traits HW (q25: 2.63 % and q75: 2.70 %) and DF: (q25: 3.66 % and q75: 3.72 %).

Therefore, the credible interval LCi and UCi) was used to refer to the Bayesian approach, while the confidence interval (CI) was used to refer to the Frequentist approach (Figure 4). In both approaches DF and HW had the lowest CVe values, with a range between 3 and 4 % for DF and 2 and 3 % for HW. GY and GYP showed the highest CVe values, ranging between 12 and 13 % for GY, and 12 and 16 % for GYP; in both approaches, GY and GYP require attention in the experimental planning.

Figure 4
Results of the Bayesian and Frequentist approaches of experimental CVe for the credible and confidence intervals of ten traits in wheat, where: days for the flowering (DF), grain yield (GY), grain yield per plant (GYP), hundred-grain weight (HGW), hectoliter weight (HW), spike length (LS), number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW).

The traits TGW, PH, NSPS, LS, HGW, and NGS revealed similar magnitudes for CVe, with values between 5 and 11 % for both approaches, considering the credible and confidence interval of 95 %.

The variable GYP presented the highest values for the interquartile range of credible and confidence interval of CVe. For the ten traits evaluated, the Bayesian approach showed a shorter range of credible intervals for posterior mean in relation to the confidence interval for the mean estimate, considering the model Gamma for the Frequentist and Gamma model with prior no-informative for the Bayesian approach.

Discussion

The DIC values for the Bayesian approach and the AIC for the Gamma model and the CVe of the ten traits in wheat as well as the values of the normal distribution are presented in Table 2. Based on the three DIC values, we selected the DIC 1 model with the uniform distribution parameters 0-5, due to the smaller deviance values. We observed differences between DIC 1 and AIC of the Gamma model, whose deviance values for the Bayesian model were lower than three units in eight of the ten traits evaluated.

On the other hand, the AIC values of the normal model were surprisingly higher than the AIC values of the Gamma model for all wheat variables evaluated. These results directly impact on the CVe classification methods that use properties of the normal distribution. For the HW, PH, and TGW traits, for example, the difference between the AIC values was above 100 units. The AIC is a ubiquitous tool in statistical modeling and is an estimate for the out-of-sample error based on information theory. The AIC estimates the relative amount of information lost by a model, that is, the less information a model loses, the higher the quality of that model and the lower the AIC score. The criteria for model selection provide a valuable tool to identify a model of appropriate structure and dimension among candidates and are used to compare models based on different probability distributions for the outcome variable. A selection criterion assesses whether a fitted model offers an optimal balance between the goodness-of-fit and the parsimony (Cavanaugh and Neath, 2019Cavanaugh, J.E.; Neath, A.A. 2019. The Akaike information criterion: background, derivation, properties, application, interpretation, and refinements. Wiley Interdisciplinary Reviews: Computational Statistics 11: e1460. https://doi.org/10.1002/wics.1460
https://doi.org/10.1002/wics.1460...
).

Posterior and Frequentist distributions of the CVe of the evaluated traits presented some differences, mainly for the descriptive statistics. The CVe distribution for different traits is an interesting and partially conclusive aspect, since the absence of normal distribution of CVe of traits in wheat can be visually represented, regardless of the sample size. The literature presents many methods and studies on CVe with different species and most presuppose that the data is normally distributed. This is not always true, since the Gamma distribution demonstrated a more similar fit for the CVe data distribution for the ten traits evaluated here.

The distributions of CVe data presented in Figures 2 and 3 indicate a wide variability of CVe in wheat crops. The Bayesian and Frequentist approaches demonstrated contrasting distributions, while the estimate and posterior means as well as the credible and confidence intervals were similar. This significant variation is observed to justify the classification coefficient variation for these traits individually (Costa et al., 2002Costa, N.H.A.D.; Seraphin, J.C.; Zimmermann, F.J.P. 2002. A new method of variation coefficient classification for upland rice crop = Novo método de classificação de coeficientes de variação: material e métodos. Pesquisa Agropecuária Brasileira 37: 243–249 (in Portuguese, with abstract in English). https://doi.org/10.1590/S0100-204X2002000300003
https://doi.org/10.1590/S0100-204X200200...
, Nardino et al., 2020Nardino, M.; Pereira, J.M.; Marques, V.T.; D’Avila, F.C.; Franco, F.D.; Barros, W.S. 2020. Coefficient of variation: a new approach for the study in maize experiments. Revista Brasileira de Biometria 38: 185–206. https://doi.org/10.28951/rbb.v38i2.440
https://doi.org/10.28951/rbb.v38i2.440...
).

Non-normally distributed CVe is commonly observed, but this information is frequently neglected. The Bayesian approach has some advantages, such as flexibility in selecting the distributions for sample data and unknown parameters as well as the possibility of incorporating the prior knowledge about the parameters of the model (Sorensen and Gianola, 2002Sorensen, D.; Gianola, D. 2002. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer, Berlin, Germany. http://dx.doi.org/10.1007/b98952
http://dx.doi.org/10.1007/b98952...
; Silva et al., 2013Silva, F.F.; Viana, J.M.S.; Faria, V.R.; Resende, M.D.V. 2013. Bayesian inference of mixed models in quantitative genetics of crop species. Theoretical and Applied Genetics 126: 1749–1761. https://doi.org/10.1007/s00122-013-2089-6
https://doi.org/10.1007/s00122-013-2089-...
).

The values of the a posteriori mean and the mean of the Gamma model were similar. Associated to the average of the Bayesian and Frequentist models, we added the a posteriori standard deviation statistic and the standard error estimate. The values for these statistics generated are lower in the Bayesian model in seven of the ten traits studied. However, the mean magnitude is highly contrasting among the variables, with the lowest CVe mean for HW = 2.6 % and the highest CVe mean for GYP = 13.6 %. This demonstrates that the CVe magnitude is directly associated to the trait nature and its distribution. Some variables, such as GYP, NGS, and GY presented high CVe magnitudes. Therefore, we should increase the number of repetitions and/or plants per plot to reduce the magnitude of the experimental error.

In terms of the credible interval, the Bayesian approach revealed a shorter range than the frequentist. Credible intervals or credible regions are built to qualify in terms of final precision, that is, for validation of the data observed rather than repetitions or hypothetical results (Resende et al., 2014Resende, M.D.V.; Silva, F.F.; Azevedo, C.F. 2014. Mathematical Statistics, Biometric and Computational Statistics: Mixed, Multivariate, Category and Generalized Models (REML/BLUP), Bayesian Inference, Random Regression, Genomic Selection, QTL, GWAS, Spatial and Temporal Statistics, Competition, Survival = Estatística Matemática, Biométrica e Computacional: Modelos Mistos, Multivariados, Categorias e Generalizados (REML/BLUP), Inferência Bayesiana, Regressão Aleatória, Seleção Genômica, QTL, GWAS, Estatística Espacial e Temporal, Competição, Sobrevivência. UFV, Viçosa, MG, Brazil (in Portuguese).). In this respect, significant criticism has been made about the Frequentist confidence intervals (Murteira, 1995Murteira, B.J.F. 1995. Introduction to Bayesian inference. = Introdução à inferência bayesiana. Available at: http://hdl.handle.net/10362/7593 [Accessed Feb 11, 2021] (in Portuguese).
http://hdl.handle.net/10362/7593...
) since the experiments are not likely to be thoroughly repeated.

We opted to use non-informative prior due to the different approaches observed in the literature on the CVe distribution. We also identified that CVe distribution was highly dependent on the species studied and the variable measured, hindering the establishment of general intervals for CVe classification. This has been reported in the literature. Nevertheless, few studies have reported on the distribution of the CVe by comparing the confidence and credible intervals between the Bayesian and Frequentist approaches. The Bayesian approach provided shorter credible intervals than the Frequentist approach for most wheat traits studied. We obtained the standard deviations using Bayesian inference on the CVe study with the posterior and exact credible intervals for the parameters of each variable obtained through the meta-analysis.

The CVe of the ten variables studied is frequently used in wheat research for cultivar phenotyping, phenotypic diversity, the selection of progenies and families in breeding programs, and the final screening for the release of new wheat cultivars. These traits are associated to grain yield (TGW, NGS, HGW, GYP and GY), plant morphology (PH, NSPS, LS and DF) and, indirectly, industrial quality (HW). In this sense, the results of this study also have applicability in different agronomic areas, including plant breeding. This study demonstrates the experimental quality of the trials based on the CVe magnitude. Besides, the results assist in the decision-making process for an experimental plan, such as the experimental design, number of repetitions, and the treatments of the plants/progenies to be measured.

Conclusions

This study obtained the CVe credible and confidence intervals for wheat traits, which could be used in experimental accuracy measurements of other experiments.

The posterior distribution of CVe for the ten wheat traits has less variation among percentiles. The Gamma distribution presents a better fit in the CVe data distribution than the normal distribution.

The estimate and posterior means were similar between the Bayesian and Frequentist approaches. CVe values higher than 13 % are outside the confidence and credible intervals for grain yield in wheat.

References

  • Aerts, S.; Haesbroeck, G.; Ruwet, C. 2015. Multivariate coefficients of variation: comparison and influence functions. Journal of Multivariate Analysis 142: 183–198. https://doi.org/10.1016/j.jmva.2015.08.006
    » https://doi.org/10.1016/j.jmva.2015.08.006
  • Albert, A.; Zhang, L. 2010. A novel definition of the multivariate coefficient of variation. Biometrical Journal 52: 667–675. https://doi.org/10.1002/bimj.201000030
    » https://doi.org/10.1002/bimj.201000030
  • Arnhold, E.; Milani, K.F. 2011. Rank-ordering coefficients of variation for popping expansion. Acta Scientiarum. Agronomy 33: 527-531. https://doi.org/10.4025/actasciagron.v33i3.11911
    » https://doi.org/10.4025/actasciagron.v33i3.11911
  • Casella, G.; Berger, R. 2002. Statistical Inference. 2ed. Thomson Learning, Duxbury, CA, USA.
  • Cavanaugh, J.E.; Neath, A.A. 2019. The Akaike information criterion: background, derivation, properties, application, interpretation, and refinements. Wiley Interdisciplinary Reviews: Computational Statistics 11: e1460. https://doi.org/10.1002/wics.1460
    » https://doi.org/10.1002/wics.1460
  • Costa, N.H.A.D.; Seraphin, J.C.; Zimmermann, F.J.P. 2002. A new method of variation coefficient classification for upland rice crop = Novo método de classificação de coeficientes de variação: material e métodos. Pesquisa Agropecuária Brasileira 37: 243–249 (in Portuguese, with abstract in English). https://doi.org/10.1590/S0100-204X2002000300003
    » https://doi.org/10.1590/S0100-204X2002000300003
  • Couto, M.F.; Peternelli, L.A.; Barbosa, M.H.P. 2013. Classification of the coefficients of variation for sugarcane crops. Ciência Rural 43: 957-961. https://doi.org/10.1590/S0103-84782013000600003
    » https://doi.org/10.1590/S0103-84782013000600003
  • Fritsche-Neto, R.; Vieira, R.A.; Scapim, C.A.; Miranda, G.V.; Rezende, L.M. 2012. Updating the ranking of the coefficients of variation from maize experiments. Acta Scientiarum. Agronomy 34: 99-101. https://doi.org/10.4025/actasciagron.v34i1.13115
    » https://doi.org/10.4025/actasciagron.v34i1.13115
  • Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. 2004. Bayesian Data Analysis. 2ed. CRC Press, Boca Raton, FL, USA.
  • Gelman, A. 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1: 515–34. https://doi.org/10.1214/06-BA117A
    » https://doi.org/10.1214/06-BA117A
  • Mora, F.; Arriagada, O. 2016. A classification proposal for coefficients of variation in Eucalyptus experiments involving survival, growth and wood quality variables. Bragantia 75: 263-267. https://doi.org/10.1590/1678-4499.458
    » https://doi.org/10.1590/1678-4499.458
  • Murteira, B.J.F. 1995. Introduction to Bayesian inference. = Introdução à inferência bayesiana. Available at: http://hdl.handle.net/10362/7593 [Accessed Feb 11, 2021] (in Portuguese).
    » http://hdl.handle.net/10362/7593
  • Nardino, M.; Pereira, J.M.; Marques, V.T.; D’Avila, F.C.; Franco, F.D.; Barros, W.S. 2020. Coefficient of variation: a new approach for the study in maize experiments. Revista Brasileira de Biometria 38: 185–206. https://doi.org/10.28951/rbb.v38i2.440
    » https://doi.org/10.28951/rbb.v38i2.440
  • Olivoto, T.; Lúcio, A.D. 2020. Metan: an R package for multi-environment trial analysis. Methods in Ecology and Evolution 11: 783–789. https://doi.org/10.1111/2041-210X.13384
    » https://doi.org/10.1111/2041-210X.13384
  • Piepho, H-P.; Möhring, J. 2006. Selection in cultivar trials: is it ignorable? Crop Science, 46: 192–201. https://doi.org/10.2135/cropsci2005.04-0038
    » https://doi.org/10.2135/cropsci2005.04-0038
  • Resende, M.D.V.; Duarte, J.B. 2007. Precision and quality control in variety trials. Pesquisa Agropecuária Tropical 37: 182-194 (in Portuguese, with abstract in English). https://doi.org/10.5216/pat.v37i3.1867
    » https://doi.org/10.5216/pat.v37i3.1867
  • Resende, M.D.V.; Silva, F.F.; Azevedo, C.F. 2014. Mathematical Statistics, Biometric and Computational Statistics: Mixed, Multivariate, Category and Generalized Models (REML/BLUP), Bayesian Inference, Random Regression, Genomic Selection, QTL, GWAS, Spatial and Temporal Statistics, Competition, Survival = Estatística Matemática, Biométrica e Computacional: Modelos Mistos, Multivariados, Categorias e Generalizados (REML/BLUP), Inferência Bayesiana, Regressão Aleatória, Seleção Genômica, QTL, GWAS, Estatística Espacial e Temporal, Competição, Sobrevivência. UFV, Viçosa, MG, Brazil (in Portuguese).
  • Silva, F.F.; Viana, J.M.S.; Faria, V.R.; Resende, M.D.V. 2013. Bayesian inference of mixed models in quantitative genetics of crop species. Theoretical and Applied Genetics 126: 1749–1761. https://doi.org/10.1007/s00122-013-2089-6
    » https://doi.org/10.1007/s00122-013-2089-6
  • Smith, B.J. 2007. boa: an R package for MCMC output convergence assessment and posterior inference. Journal of Statistical Software 21: 1–37. https://doi.org/10.18637/jss.v021.i11
    » https://doi.org/10.18637/jss.v021.i11
  • Sorensen, D.; Gianola, D. 2002. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer, Berlin, Germany. http://dx.doi.org/10.1007/b98952
    » http://dx.doi.org/10.1007/b98952
  • Sturtz, S.; Ligges, U.; Gelman, A. 2005. R2WinBUGS: a package for running WinBUGS from R. Journal of Statistical Software 12: 1–16. 1 https://doi.org/0.18637/jss.v012.i03
    » https://doi.org/0.18637/jss.v012.i03
  • Wickham, H. 2016. ggplot2 - Positioning Elegant Graphics for Data Analysis. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-319-24277-4
    » https://doi.org/10.1007/978-3-319-24277-4
  • Data availability statement
    The datasets generated and/or analyzed during the current study, as well as the codes used to reproduce the examples, are publicly at: https://doi.org/10.17605/OSF.IO/8KTPA

Appendix

Appendix I – R Program used for obtained of results

library(tidyverse)

library(metan)

library(rio)

library(ggrepel)

library(R2OpenBUGS)

library(boa)

library(ggformula)

The data and code can be obtained at: https://tiagoolivoto.github.io/paper_coefvar/code.html#2_Data

3 Bayesian

3.1 Function

data_cv <- import("http://bit.ly/data_cvs") %>% select(GY:NGS)

# Long format data_cv_long <- data_cv %>% pivot_longer(everything(), names_to = "var", values_to = "cv") %>% remove_rows_na()

# samples per variable data_cv_long %>% n_by(var)

# create a list of traits with no missing values df <- lapply(data_cv, remove_rows_na)

bayes <- function(df){ linemodel <- function(){ for (i in 1:64) # change the number of samples for each variable { y[i] ~ dgamma(r, mu) } r ~ dunif(0,5) mu ~ dunif(0,5) } ################ Specification the data linedata <- list(y = df[[1]]) ###################### Specification initial values lineinits <- function(){list(r = 0.5, mu = 1) } #Specification the parameters parameters <- c("r","mu") ############# Execution function analysis with bugs package of R2OpenBUGS Niter <- 10000 Nburn <- 1000 Nthin <- 10 ################ results of descriptive statistics ############# modelo <- bugs(data = linedata, inits = lineinits, parameters.to.save = parameters, model.file = linemodel, n.chains = 1, n.iter = Niter, n.burnin = Nburn, n.thin = Nthin, debug = TRUE) return(modelo$sims.matrix[,1] / modelo$sims.matrix[,2]) }

3.2 Posterior distribution

GY <- bayes(df$GY) GYP <- bayes(df$GYP) HGW <- bayes(df$HGW) TGW <- bayes(df$TGW) HW <- bayes(df$HW) DF <- bayes(df$DF) PH <- bayes(df$PH) LS <- bayes(df$LS) NSPS <- bayes(df$NSps) NGS <- bayes(df$NGS)

3.3 Credibility intervals and mean posterior for each trait

posterior <- import(" http://bit.ly/data_posterior") conf_int_bayes <- sapply(posterior, function(x){ conf_int <- boa.hpd(x, 0.05) data.frame(LCI = conf_int[[1]], MEAN = mean(x), UCI = conf_int[[2]]) }) %>% t()

3.4 Marginal posterior density

posterior_long <- posterior %>% pivot_longer(everything()) ggplot(posterior_long, aes(value))+ geom_density(fill = "red", alpha = 0.5, size = 0.1) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + facet_wrap(~ name, scales = "free_y", ncol = 5) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Density") ggsave("figs/fig1_posterior.jpg", dpi = 600, width = 25, height = 10, units = "cm")

# An alternative plot ggplot(posterior_long, aes(value))+ geom_density(aes(fill = name), alpha = 0.5) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Density") ggsave("figs/fig1_posterior2.jpg", dpi = 600, width = 25, height = 10, units = "cm")

4 Frequentist

4.1 Confidence interval

get_confint <- function(df, var){ if(is.grouped_df(df)){ results <- doo(df, get_confint, var = {{var}}) return(results) } values <- na.omit(df %>% select_cols({{var}}) %>% pull()) model <- glm(values ~ 1, family = Gamma(link = "identity")) conf <- confint(model) MEAN <- coef(model)[[1]] LCI <- conf[[1]] UCI <- conf[[2]] data.frame(LCI = LCI, MEAN = MEAN, UCI = UCI) } freq_lim <- data_cv_long %>% group_by(var) %>% get_confint(cv) p <- gf_density( ~ cv | var, data = data_cv_long, fill = "red", alpha = 0.5) %>% gf_fitdistr(linetype = 2) %>% gf_fitdistr(dist = "gamma", color = "blue") p + facet_wrap(~var, nrow = 2, scales = "free_y") + theme(panel.grid.minor = element_blank(), axis.text = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + scale_y_continuous(expand = expansion(c(0, 0.05))) + labs(x = "Coeficient of variation (%)", y = "Density") ggsave("figs/fig2_density.jpg", dpi = 600, width = 25, height = 10, units = "cm")

5 Results

5.1 Confidence interval

df_confint <- import("http://bit.ly/data_confint") ggplot(df_confint, aes(MEAN, fct_rev(VAR), color = APPROACH)) + geom_point(position = position_dodge(width = 0.7), size = 2) + geom_errorbarh(aes(xmin = LCI, xmax = UCI), position = position_dodge(width = 0.7), width = 0.3) + scale_x_continuous(breaks = seq(2, 19, by = 2), expand = c(0.15, 0.15)) + theme(panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_blank(), axis.text = element_text(color = "black"), axis.title = element_text(color = "black"), axis.ticks = element_line(color = "black"), axis.ticks.length = unit(0.15, "cm")) + labs(x = "Coefficient of variation (%)", y = "Variable") + geom_text(aes(label = round(LCI, 2), x = LCI), position = position_dodge(width = 0.7), hjust = 1.2, size = 2.5, show.legend = FALSE) + geom_text(aes(label = round(UCI, 2), x = UCI), position = position_dodge(width = 0.7), hjust = -0.3, size = 2.5, show.legend = FALSE) ggsave("figs/fig3_confidence.jpg", dpi = 600, width = 10, height = 12, units = "cm")

Appendix II
(A) Illustrations for the number of iterations of parameters Gamma distribution (r, mu and deviance) of the Bayesian analysis for the traits: days for the flowering (DF); grain yield (GY); grain yield per plant (GYP); hundred-grain weight (HGW); hectoliter weight (HW); and spike length (LS).

Appendix II
(B) Illustrations for the number of iterations of parameters Gamma distribution (r, mu and deviance) of Bayesian analysis for the traits: number of grains per spike (NGS), number of spikelets per spike (NSPS), plant height (PH), and thousand grain weight (TGW).

Edited by

Edited by: Thomas Kumke

Publication Dates

  • Publication in this collection
    24 June 2022
  • Date of issue
    2023

History

  • Received
    28 July 2021
  • Accepted
    13 Apr 2022
Escola Superior de Agricultura "Luiz de Queiroz" USP/ESALQ - Scientia Agricola, Av. Pádua Dias, 11, 13418-900 Piracicaba SP Brazil, Phone: +55 19 3429-4401 / 3429-4486 - Piracicaba - SP - Brazil
E-mail: scientia@usp.br