How to analyze germination of species with empty seeds using contemporary statistical methods?

Santana, Denise Garcia de; Carvalho, Fábio Janoni; Toorop, Peter

doi:10.1590/0102-33062017abb0322

ABSTRACT

Statistical analysis is considered an important tool for scientific studies, including those on seeds. However, seed scientists and statisticians often disagree on the nature of variables addressed in germination experiments. Statisticians consider the number of germinated seeds to be a binomially distributed variable, whereas seed scientists convert it into a percentage and often analyze it as a normally distributed variable. The requirement for normal adjustment restricts the models of analysis of variance that can be used. Lack of fit requires nonparametric tests, but they are known by their inferential problems. Generalized Linear Models (GLM) can provide better fit to germination variables for any species, including Lychnophora ericoides Mart., because they allow wider probability distributions with fewer requirements. Here we suggest the use of relative germination besides absolute germination for species with seed development problems, such for L. ericoides and others from the campos rupestres. This paper introduces the most current statistical advancements and increases the possibilities for their application in seed science research.

Keywords:
Brazilian arnica; campos rupestres; data transformation; deviance analysis; Generalized Linear Models; relative germination

Introduction

Germination is the most frequently described trait in seed science, and methods of statistical analyses for such studies, including analysis of variance (ANOVA), Student's t-test, Tukey, Scott-Knott, Kruskal-Wallis, and Mann-Whitney tests, appear to be validated by their continued use. However, the perspective of researchers that perform seed biology research is not shared by statisticians, primarily because of the nature of variables involved in germination. For seed scientists, the number of germinated seeds is converted into a percentage and an assumed normal distribution is attributed to this variable, for which ANOVA is the preferred statistical method (Sileshi 2012Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159. ). Developed by Fisher between 1920 and 1935, ANOVA models emphasize the importance of repetition, randomization and local control on experimental efficiency (Fisher 1925Fisher RA. 1925. Statistical methods for research workers. London, Oliver & Boyd.; 1934Fisher RA. 1934. Two new properties of mathematical likelihood. Proceedings of the Royal Society A 144: 285-307. ). Based on a normal linear model, ANOVA requires homoscedasticity, independent residuals and additivity effects between treatments and blocks (Steel & Torrie 1980Steel RGD, Torrie JH. 1980. Principles and procedures of statistics. New York, McGraw-Hill.; Sokal & Rohlf 1995Sokal RR, Rohlf FJ. 1995. Biometry: the principles and practice of statistics in biological research. 3th. edn. New York, W. H. Freeman.). Therefore, some researchers have opted to apply angular transformation, which is the widely known $A r c s i n e \sqrt{\frac{x}{100}}$ , in an attempt to equate the germination percentage (x) to normal distribution, stabilize variances and produce independent residuals without realizing there can be an impairment on inferences (Warton & Hui 2011Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10. ; Sileshi 2012Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159. ). A large group of researchers believe in the lack of fitting of biological data to the normal distribution, and so they choose non-parametric tests (such as Kruskal-Wallis and Mann-Whitney).

Violations of ANOVA’s assumptions, the low statistical power of nonparametric tests and criticism to data transformation (Warton & Hui 2011Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10. ) have led to the development of statistical models with fewer requirements other than the normal distribution. The Generalized Linear Models (GLM) are the conjunction of statistical models formed by more flexible probability distributions, being ANOVA a particularization of these models (Nelder & Wedderburn 1972Nelder JA, Wedderburn RWM. 1972. Generalized Linear Models. Journal of the Royal Statistical Society 135: 370-384.; Wilson & Hardy 2002Wilson K, Hardy ICW. 2002. Statistical analysis of sex ratios: an introduction. In: Hardy ICW. (ed.) Sex ratios: concepts and research methods. Cambridge, Cambridge University Press. p. 49-92. ; Warton & Hui 2011Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10. ). The GLMs include several error distributions including the binomial, Poisson, negative binomial, and beta binomial, bringing more alternatives for the variable distribution in seed studies due to the binary nature of germination (i.e. seeds either germinate or do not germinate), fixed number of seeds, independence of germination (the chances for a seed to germinate is not affected by the other seeds) and variation in germination occurrence.

In scientific research, the statistical advances involving counting variables are recognized, but some peculiarities of the process of seed formation from some Brazilian species may limit the analysis of the true seed germination potential. For species distributed on quartzitic campos rupestres, the combination of shallow, extremely-impoverished soils (Benites et al. 2003Benites VM, Caiafa AN, Mendonça ES, Schaefer CE, Ker JC. 2003. Solo e vegetação nos complexos rupestres de altitude da Mantiqueira e do Espinhaço. Floresta e Ambiente 10: 76-85.; Oliveira et al. 2015Oliveira RS, Galvão HC, Campos MCR, Eller CB, Pearse SJ, Lambers H. 2015. Mineral nutrition of campos rupestres plant species on contrasting nutrient-impoverished soil types. New Phytologist 205: 1183-1194. ) with pollen limitation, and genetic load may result in a trade-off between sexual vs. asexual reproduction. These characteristics can affect the seed development process and increase the frequency of embryoless seeds. Empty seeds are phylogenetically skewed and occur in several families (Dayrell et al. 2016Dayrell RLC, Garcia QS, Negreiros D, Baskin CC, Baskin JM, Silveira FAO. 2016. Phylogeny strongly drives seed dormancy and quality in a climatically buffered hotspot for plant endemism. Annals of Botany 119: 267-277.). As a result, it is common to attribute, albeit incorrectly, low percentages of germination to certain campos rupestres species.

The percentage of seeds with embryos can be increased through physical separation methods like an aspirator seed cleaning machine, sieves or X-ray techniques. By identifying seeds without an embryo, such techniques have a direct impact on germination levels (Tonetti et al. 2006Tonetti OAO, Davide AC, Silva EAA. 2006. Physical and physiological quality of Eremanthus erythropappus (DC.) Mac. Leish. Revista Brasileira de Sementes 28: 114-121.). However, the absence of an embryo is not easily perceived or quantified for most campos rupestres seeds, and it is sometimes difficult to separate seeds with or without an embryo from the samples.

An alternative for most species is to quantify the number of seeds with embryos regardless of whether the embryo is dead or not. In this case, the germination percentage will not be divided by the sample size (e.g., 25, 50 or 100 seeds) but by the number of seeds with an embryo (sum of germinated, dormant and dead seeds). The relative germination minimizes underestimates of the germination potential for species when embryoless seeds are present in the sample. Researchers may question the need to calculate the relative germination percentage, because embryoless seeds are considered an impure fraction of the sample. This impure fraction approach is common among seed technologists, but not always shared by botanists interested in determining causes and reasons for embryoless seeds. Relative germination does not exclude the importance of the germination percentage as it is usually calculated; only the selection criteria are different, standardized for seed with embryo.

The evolution of statistical analysis for seed germination analysis, starting from the nonparametric tests and getting to the Generalized Linear Models, is the subject of this paper. It was also aimed to describe a relative measurement of germination for L. ericoides seeds, as an example of other species from campos rupestres with embryo development problems.

Materials and methods

To develop a chronology of statistical advancements in germination, an experiment was performed with seeds from Lychnophora ericoides Mart. (Asteraceae), a species in a genus that is endemic in campos rupestres from Brazilian states of Bahia, Goiás and Minas Gerais (Semir 1991Semir J. 1991. Revisão taxonômica de Lychnophora Mart. (Veroniaceae: Compositae). PhD Thesis, Universidade Estadual de Campinas, Campinas.). Lychnophora ericoides capitula were collected in 2007 from individuals distributed at altitudes between 1,102 and 1,245 m from a population distributed in Serra da Bocaina, Brazil, which is formed by quartzite mountains at a maximum altitude of 1,350 m. The average maximum and minimum temperatures are 26.5 and 15.7 °C, respectively, and the pluviometric index is 1,574.7 mm per year.

The germination experiment was conducted in a germination chamber (Seedburo Equipment Company), set to a mean day temperature of 26.3 °C and a mean night temperature of 23.8 °C with a mean irradiance of 114.82 ± 8.36 µmol m^-2 s^-1. Seeds of 20 individuals of L. ericoides were distributed in a completely randomized design with four repetitions, formed by 80 plots containing 50 seeds each (200 seeds per individual). Germination was scored daily until 70 days after sowing, and the evaluation criterion was radicle protrusion. After germination ceased, the non-germinated seeds were cut to determine the number of seeds with and without an embryo per individual. Seed mortality was scored only at the end of the experiment and, if containing an embryo, added to the number of viable seeds with an embryo, indicated by tetrazolium test, to calculate the relative germination percentage. The characteristics analyzed were germination percentage over 50 seeds and relative germination percentage over the number of seeds with an embryo.

The results were analyzed in detail using normal and binomial distributions (GLM) and nonparametric tests. Since the species aggregates the main characteristics and particular populations distributed in campos rupestres, there is no obstruction to implementing these statistics routines to other species.

Relative germination percentage was obtained by the quotient between the number of germinated seeds and the number of seeds with an embryo (germinated, not germinated or dead):

R G (%) = \frac{N u m b e r o f g e r m i n a t e d s e e d s}{N u m b e r o f s e e d s w i t h a n e m b r y o} 100

The usual expression is divided by the sample size:

G (%) = \frac{N u m b e r o f g e r m i n a t e d s e e d s}{N u m b e r o f s e e d s p e r s a m p l e} 100

Initially, the experimental results were statically analyzed by converting the number of germinated seeds into a percentage. Adjustment of residuals to the normal distribution was checked according to the Kolmogorov-Smirnov (K-S) test and variance homogeneity was checked with the Levene (F) test. The assumptions were not met and $A r c s i n e \sqrt{\frac{x}{100}}$ transformation was performed. Because of the lack of residuals normality, the Kruskal-Wallis test, a nonparametric version of ANOVA, was also applied with the Dunn’s test for multiple comparisons for both variables: germination percentage (G) and relative germination (RG) of L. ericoides seeds. Even with the violated assumptions, inferences on the germination capacity of seeds from L. ericoides individuals were performed by GLM/ANOVA with the Scott-Knott test for comparisons, to demonstrate the statistical consequences of using this approach.

For GLM application, the random component for the germination experiment of L. ericoides seeds was expressed in two forms: as a percentage that assumed normal distribution and as a number of germinated seeds that assumed binomial distribution. Both distributions belong to the parametric exponential family and present probability density functions defined as follows.

Normal: $f (y; µ, σ^{2}) = \frac{1}{\sqrt{2 π σ^{2}}} e x p [- \frac{1}{2} {(\frac{y - µ}{σ})}^{2}] µ \in R$ ,

where y is the germination percentage (G or RG); μ is the mean; 𝜎 2 is the variance; and 𝜋 is the mathematical constant approximated by 3.14;

and

Binomial: $f (y; π) = (\begin{matrix} n \\ y \end{matrix}) π^{y} {(1 - π)}^{n - y} 0 \leq π \leq 1$ ,

where y is the number of germinated seeds; n is the number of seeds per repetition (total seeds or only those with an embryo); and 𝜋 is the proportion of germinated seeds.

The explanatory variable represented by L. ericoides individuals and the effect of their respective germination corresponded to the systematic component of the model. For binomial distribution, the link function adopted was logit, and for normal distribution it was identity. Normal plots performed a graphical comparison between the studentized deviation component and the observed quartiles of the sample. Graphs were generated with intervals of 95 % of confidence to guarantee a better inference in the visual analysis. In all tests, the established significance was 0.05.

Results

A brief comparison between the means of the absolute percentage (G) and the relative percentage (RG) of germination showed evidences of the underestimation of germination potential of some individuals when germination was based on the number of sown seeds of the sample and not on the number of seeds with an embryo (Tab. 1). Individual 7 was erroneously quantified as not efficient by absolute germination (12 %), while this individual showed a high relative germination capacity (RG = 78.1 %). The high relative seed germination potential could also be observed for individuals 11, 13 and 20. The relative percentages did not underestimate the potential of seed germination of L. ericoides and provided an improved discrimination among individuals. However, some individuals presented both absolute (G) and relative (RG) low germination (e.g., individuals 1, 5 and 14).

Thumbnail

Table 1
Means of absolute and relative germination (G and RG) of Lychnophora ericoides seeds, including percentage of embryoless seeds.

The first statistical check of assumptions of the model indicated low probabilities of Kolmogorov-Smirnov and Levene tests (P < 0.05), demonstrating non-Normal residuals and heterogeneous variances for the germination percentage (G) and relative germination (RG) of L. ericoides seeds, both on the original and transformed scale (Tab. 2). The GLM/ANOVA model was applied to G and RG data even with the violation of the assumptions. The non-parametric Kruskal-Wallis (KW) test was also applied. In both procedures, probabilities lower than 0.05, associated with Snedecor's F statistic and Kruskal-Wallis' H statistic, indicated one or more differences in the germination (G and RG) of L. ericoides individuals 1 and 20 (Tab. 3). An important detail of GLM/ANOVA was the high values of the coefficient of variation for both germination percentages (80.5 and 71.4 %), sensu Pimentel-Gomes (2000Pimentel-Gomes F. 2000. Curso de estatística experimental. 14th. edn. Piracicaba, Nobel. ).

Thumbnail

Table 2
Kolmogorov-Smirnov (K-S) test results for normality of residuals and Levene's F statistics for homogeneity of variances for germination percentages (G and RG) of Lychnophora ericoides seeds.

Thumbnail

Table 3
Analysis of variance (ANOVA) and Kruskal-Wallis test to determine the germination percentages of Lychnophora ericoides (Asteraceae) seeds from 20 individuals of a population endemic of campos rupestres.

It was expected that tests for multiple comparisons would identify at least one difference between means or medians because the hypothesis of equal germination capacities of L. ericoides individuals was rejected. In fact, the Scott-Knott test identified differences among individuals of L. ericoides, although these differences were not detected by Dunn's non-parametric test (Tab. 4). The Scott-Knott test separated individuals of the species into two groups according to the germination percentage: seeds with germination means equal or less than 6.5 % and seeds with germination between 9 and 15 % (Tab. 4). With regards to the relative percentage, three groups were formed: group I, seeds with percentages higher than 66.9 %; group II, seeds with percentages between 41.8 and 52.4 %; and group III, seeds with percentages lower than 30.8 %. The results pointed to Scott-Knott’s results are reliable with the lack of residuals adjustment to normal distribution and in the presence of heterogeneous variances.

Thumbnail

Table 4
Results of the Scott-Knott test for multiple comparisons, performed with means, and Dunn's test, performed with ranks, represented by the medians of seed germination (G and RG) of Lychnophora ericoides.

Dunn's non-parametric test was not able to detect differences from 0 to 15 % in the absolute percentage of germination and from 0 to 81.3 % in the medians of the relative percentage of L. ericoides seeds (Tab. 4). It should be noted that the test that preceded it, KW, indicated that there was at least one difference between the medians for the two characteristics, G and RG (Tab. 3). The question arises whether the non-parametric tests form an alternative when GLM/ANOVA’s assumptions are not met.

A simple comparison between the parametric (GLM/ANOVA, Scott-Knott) and non-parametric tests (KW and Dunn) showed a greater coherence of parametric tests, even though model assumptions were not met. The results question the efficiency of non-parametric tests. In this case, data can be analyzed by means of GLM for probability distributions other than the normal distribution.

It must be considered that the GLM - normal distribution with identity link function (Tab. 5) is the conventional ANOVA (Tab. 3). In this analysis, as shown previously, the P values were less than 0.05. For the GLM- binomial distribution with logit link function, the G and RG variables also had P values lower than 0.05 (Tab. 5).

Thumbnail

Table 5
Analysis of deviance (ANODEV) for germination of Lychnophora ericoides seeds in which the variable is expressed as percentage adjusted to normal distribution or number of seeds adjusted to binomial distribution.

The inferential similarity of the models did not imply a similarity of goodness of fit. Binomial distribution was considered to have a better fit with the quantile-quantile graphic analysis (Fig. 1). It was possible to detect poor fitting of the observations of the simulated envelope for the normal distribution, mainly at the ends of the envelope. This leakage was only moderately observed in the binomial model.

Figure 1
Q-Q plot graphs to identify deviation of the data in relation to the normal and binomial distributions. The points represent studentized residual deviation and red dashed lines the 95 % confidence intervals at 0.05 significance.

Discussion

The correction of the germination percentage for the number of seeds with embryos and not the number of seeds of the sample is necessary for botanic families that have problematic seed formation, such as Asteraceae, Cyperaceae, Melastomataceae and Poaceae (Dayrell et al. 2016Dayrell RLC, Garcia QS, Negreiros D, Baskin CC, Baskin JM, Silveira FAO. 2016. Phylogeny strongly drives seed dormancy and quality in a climatically buffered hotspot for plant endemism. Annals of Botany 119: 267-277.). We show here that Lychnophora ericoides is a species that represents this issue. It cannot be inferred that the seeds of these species have low germination potential. What can be inferred is that some individuals produce large amounts of seeds without embryos, but when the embryo is present, seeds of some individuals show high germinability. This seed development problem is important for understanding the reproduction potential of a species, and it should be considered in statistical analysis for species that produce large amounts of embryoless seeds.

The detection of these empty seeds in the planning phase is not so simple and only with the imbibition due to the germination process, the absence of the embryo became evident. Moreover, the production of empty seeds has high variability and unpredictability, both at the individual tree and the population level (Perea et al. 2013Perea R, Venturas M, Gil L. 2013. Empty seeds are not always bad: simultaneous effect of seed emptiness and masting on animal seed predation. Plos One 8(6): e65573. doi: https://doi.org/10.1371/journal.pone.0065573
https://doi.org/https://doi.org/10.1371/... ). One of the consequences of the unquantified presence of empty seeds is the recurrent and ineffective planning of experiments based on germination methods, especially those to overcome dormancy in order to increase the germination percentages.

Labeled as inappropriate, aberrant, and outdated, the most impetuous criticisms fell upon data transformation (Sakia 1992Sakia RM. 1992. The Box-Cox transformation technique: a review. Journal of the Royal Statistical Society 41: 169-178.; Wilcox 1998Wilcox RR. 1998. How many discoveries have been lost by ignoring modern statistical methods? American Psychologist 53: 300-314.; Sileshi 2007Sileshi GW. 2007. Evaluation of statistical procedures for efficient analysis of insect, disease and weed abundance and incidence data. East African Journal of Science 1: 1-9.; 2012Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159. ; Osborne 2010Osborne JW. 2010. Improving your data transformations: applying the Box-Cox transformation. Practical Assessment, Research & Evaluation 15: 1-9.). The question is why ANOVA models (with or without transformations) still are the most applied statistical tools, even with the restrictions and the availability of other GLM models (Nelder & Wedderburn 1972Nelder JA, Wedderburn RWM. 1972. Generalized Linear Models. Journal of the Royal Statistical Society 135: 370-384.; Sileshi 2012)? Two factors seem to motivate the use of outdated techniques. One is the many statistical programs with procedures anchored in the normal model. The second is a historic labeling that some science variables naturally have a lack of fit to normal distribution and the historical investment in non-parametric techniques.

Normality is an important assumption in the theory of linear models and its deviation can lead to losses of efficiency in the analysis of variance. This loss can be recovered if the true distribution is known and used instead of normal, which is the competence of Generalized Linear Models (McCulloch et al. 2008McCulloch CE, Searle SR, Neuhaus JM. 2008. Generalized, Linear, and Mixed Models, 2nd. edn. New Jersey, Willey.). However, there are some researchers that promote the robustness of GLM/ANOVA for small deviations from normality and even indicate the non-verification of this assumption (Sharpe 1970Sharpe K. 1970. Robustness of Normal Tolerance Intervals. Biometrika 57: 71-78.; Harwell et al. 1992Harwell MR, Rubinstein EN, Hayes WS, Olds CC. 1992. Summarizing Monte Carlo results in methodological research: the one- and two-factor fixed effects ANOVA cases. Journal of Educational Statistics 17: 335-339.; Driscoll 1996Driscoll WC. 1996. Robustness of the ANOVA and Tukey-Kramer statistical tests. Computers & Industrial Engineering 31: 265-268.; Faraway 2006Faraway JJ. 2006. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. New York, Chapman and Hall.). The problem with this approach is the impossibility to define the limits for small normality deviations. The low probabilities of the KS test are indicative of the large deviation of normal distribution from our data, which makes its robustness questionable for species such as L. ericoides.

The lack of fit of the germination percentages (G and RG) to a normal distribution could be explained because L. ericoides is an endemic, non-domesticated species from campos rupestres. There are several records in the literature which discard the possibility of adjustment of ecological data to a normal distribution (Hampel et al. 1986Hampel FR, Ronchetti EM, Bousseeuw PJ, Stahel WA. 1986. Robust statistics: the approach based on influence functions. New York, Wiley .; Austin 1987Austin MP. 1987. Models for the analysis of species response to environmental gradient. Vegetation 69: 35-45.; Biondini et al. 1988Biondini ME, Mielke Jr PW, Berry KJ. 1988. Data-dependent permutation techniques for the analysis of ecological data. Vegetation 75: 161-168.). However, the origin of the data is not sufficient to judge adjustment to normal distribution, and therefore specific tests need to be performed. Germination articles published between 2000 and 2011 revealed that from the experiments that used ANOVA as a statistical tool, only 19.5 % tested the assumption of normality residuals (Sileshi 2012Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159. ). Based on this reference, in about 80 % of the publications, the fit to a normal distribution is unknown, which makes it impossible to generalize that native or even cultivated species have non-normal germination data.

Our results also pointed to the inefficiency of angular transformations to approximate the residuals of both variables to normal distribution and to stabilize variances. The criticisms regarding this statistical resource are severe, not only related to its application, but also to the interpretation of the results on a different scale (Ahrens et al. 1990Ahrens WH, Cox DJ, Budhwar G. 1990. Use of the arcsine and square root transformations for subjectively determined percentage data. Weed Science 38: 452-458.; Fernandez 1992Fernandez GCJ. 1992. Residual analysis and data transformations: Important tools in statistical analysis. HortScience 27: 297-300.; Sakia 1992Sakia RM. 1992. The Box-Cox transformation technique: a review. Journal of the Royal Statistical Society 41: 169-178.; Sileshi 2007Sileshi GW. 2007. Evaluation of statistical procedures for efficient analysis of insect, disease and weed abundance and incidence data. East African Journal of Science 1: 1-9.; 2012Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159. ; Jaeger 2008Jaeger TF. 2008. Categorical data analysis : Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language 59: 434-446.; Warton & Hui 2011Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10. ; Valcu & Valcu 2011Valcu M, Valcu CM. 2011. Data transformation practices in biomedical sciences. Nature Methods 8: 104-105. ). Although widely criticized, data transformation is the most widely statistical feature used in germination articles in an attempt to correct deviations from normal distribution. Failure to meet the assumptions with the transformed scale for L. ericoides seed germination is evidence that this attempt may not be effective.

Until the introduction and use of GLMs, non-parametric tests such as Kruskal-Wallis (KW), Mann Whitney, Friedman and Dunn, were the only statistical tools available for analyzes of seed germination of species whose data failed to meet ANOVA’s assumptions, such as L. ericoides. In fact, failing to meet the assumptions may lead to loss of test reliability, problems with Type I or II errors (Glass et al. 1972Glass GY, Peckham PD, Sanders JR. 1972. Consequences of failure to meet the assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research 42: 237-288.; Bradley 1978Bradley JY. 1978. Robustness? British Journal of Mathematical and Statistical Psychology 31: 144-152.; Levine & Dunlap 1983Levine DW, Dunlap WP. 1983. Data transformation, power, and skew: a rejoinder to games. Psychological Bulletin 93: 596-599.; Rasmussen 1985Rasmussen JL. 1985. Data transformation maximizing homocedasticity and whitin-group normality. Behavior Research Methods, Instruments & Computers 17: 411-412.) and with the level of significance of the test (Kempthorne 1952Kempthorne O. 1952. Design and analysis of experiments. New York, Wiley .; Little & Hills 1978Little TM, Hills FJ. 1978. Agricultural experimentations: Design and analysis. New York, Wiley .; Gomez & Gomez 1984Gomez KA, Gomez AA. 1984. Statistical procedures for agricultural research. 2nd. edn. New York, Wiley.). However, these problems are amplified in the non-parametric tests. Differences not detected by the Dunn test in the order of 80% of RG between individuals are part of this problem. Although this result might seem circumstantial and particular for the species, it provided a numerical proof for simulations that have indicated the inefficiency and lower power of non-parametric tests. The observations of these problems related to non-parametric tests are not recent. Box (1953Box GEP. 1953. Non-Normality and tests on variances. Biometrika 40: 318-335.), stated that: "I do not think that we need necessarily go to the extreme of using non-parametric tests when it may well be that more powerful robust parametric tests can be found".

The problem of statistical analysis in seed germination is not in the denomination parametric or non-parametric, but in the nature of the variable involved and in the probability distribution associated to this variable. Historically analyzed as a continuous variable expressed as a percentage and associated with normal distribution, the nature of the germination data of L. ericoides (G or RG) is discrete. Scoring germination with a fixed n, originated from the total number of seeds or the number of seeds with an embryo, follow a binomial distribution. Many authors warn that there is no indication of the use of GLM/ANOVA for data with binomial nature (Zhao et al. 2001Zhao L, Chen Y, Schaffner DW. 2001. Comparison of logistic regression and linear regression in modeling percentage data. Applied and Environmental Microbiology 67: 2129-2135.; Agresti 2002Agresti A. 2002. Categorical data analysis. 2nd. edn. New York, John Wiley and Sons Press.; Warton & Hui 2011Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10. ).

With the extension of statistical analyses to other probability distributions, such as binomial achieved with the GLMs, the models and methods used in germination could be revised. The qq-plot pointed that binomial distribution was the most suitable model for the absolute and relative germination of L. ericoides. This result will probably be obtained when other species, regardless of the presence or absence of the embryo, are statistically analyzed by this model. Montgomery (2000Montgomery DC. 2000. Design and analysis of experiments . 5th. edn. New York, John Wiley & Sons.) reports the superiority of the GLM approach compared to transformation and non-parametric statistics. Specifically, Jatropha curcas and orchid seedling germination data were better fitted to a binomial logistic model (Mora et al. 2008Mora F, Gonçalves LM, Scapim CA, Martins E L, Machado MFPS. 2008. Generalized Lineal Models for the analysis of binary data from propagation experiments of Brazilian orchids. Brazilian archives of Biology and Technology 51: 963-970.; Araújo 2012Araújo GLD. 2012. Métodos de estimação em regressão logística com efeito aleatório: aplicação em germinação de sementes. PhD Thesis, Universidade Federal de Viçosa, Viçosa.).

The high values of the experimental coefficients of variation (CV = 80.5 and 71.4 %) for L. ericoides germination are not a consequence of the non-normality of the residuals and the presence of heterogeneous variances. However, it does not exclude the fact that the problems that affected the CV also affected the assumptions negatively. For L. ericoides seeds, the greatest factor that increased the CV was the absence of germination in one or more repetitions. In fact, the presence of zeros inflates variability, as an immediate consequence in CV (Ahmad et al. 2006Ahmad WMAW, Naing NN, Rosli N. 2006. An approached of Box-Cox data transformation to biostatistics experiment. Statistika 6: 1-6.), but it is not the only factor. The instability of germination between repetitions of the same individual also contributed for that CV increase.

The increased accuracy of the GLMs for variables with a binomial distribution indicates the future of statistical analysis for seed germination experiments and the need for new procedures that can be used as alternatives for ANOVAs. The germination experiment with L. ericoides seeds presented here is only one among a number of studies that has shown the low efficiency of ANOVA and data transformation. Regardless of the field of study, however, non-parametric tests and data altered by transformations should be avoided.

The divergence between seed scientists and statisticians leads to the question of whether scientific studies of germination that use GLM/ANOVA models, data transformation and non-parametric tests are incorrect. The answer is no. Even in the GLMs context, different distributions and link functions for germination data generate distinct efficiencies, and less efficient does not mean incorrect. This is the relationship between GLM/ANOVA and GLMs. It is possible to lose information when more contemporary techniques are ignored (Wilcox 1998Wilcox RR. 1998. How many discoveries have been lost by ignoring modern statistical methods? American Psychologist 53: 300-314.), but the use of GLMs to analyze germination experiments is not a guarantee of maximum efficiency.

Acknowledgements

We acknowledge the Conselho Nacional de Pesquisa e Desenvolvimento Tecnológico (CNPq) and the Fundação de Pesquisa do Estado de Minas Gerais (FAPEMIG) for financial support.

References

Agresti A. 2002. Categorical data analysis. 2nd. edn. New York, John Wiley and Sons Press.
Ahmad WMAW, Naing NN, Rosli N. 2006. An approached of Box-Cox data transformation to biostatistics experiment. Statistika 6: 1-6.
Ahrens WH, Cox DJ, Budhwar G. 1990. Use of the arcsine and square root transformations for subjectively determined percentage data. Weed Science 38: 452-458.
Araújo GLD. 2012. Métodos de estimação em regressão logística com efeito aleatório: aplicação em germinação de sementes. PhD Thesis, Universidade Federal de Viçosa, Viçosa.
Austin MP. 1987. Models for the analysis of species response to environmental gradient. Vegetation 69: 35-45.
Benites VM, Caiafa AN, Mendonça ES, Schaefer CE, Ker JC. 2003. Solo e vegetação nos complexos rupestres de altitude da Mantiqueira e do Espinhaço. Floresta e Ambiente 10: 76-85.
Biondini ME, Mielke Jr PW, Berry KJ. 1988. Data-dependent permutation techniques for the analysis of ecological data. Vegetation 75: 161-168.
Box GEP. 1953. Non-Normality and tests on variances. Biometrika 40: 318-335.
Bradley JY. 1978. Robustness? British Journal of Mathematical and Statistical Psychology 31: 144-152.
Dayrell RLC, Garcia QS, Negreiros D, Baskin CC, Baskin JM, Silveira FAO. 2016. Phylogeny strongly drives seed dormancy and quality in a climatically buffered hotspot for plant endemism. Annals of Botany 119: 267-277.
Driscoll WC. 1996. Robustness of the ANOVA and Tukey-Kramer statistical tests. Computers & Industrial Engineering 31: 265-268.
Faraway JJ. 2006. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. New York, Chapman and Hall.
Fernandez GCJ. 1992. Residual analysis and data transformations: Important tools in statistical analysis. HortScience 27: 297-300.
Fisher RA. 1925. Statistical methods for research workers. London, Oliver & Boyd.
Fisher RA. 1934. Two new properties of mathematical likelihood. Proceedings of the Royal Society A 144: 285-307.
Glass GY, Peckham PD, Sanders JR. 1972. Consequences of failure to meet the assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research 42: 237-288.
Gomez KA, Gomez AA. 1984. Statistical procedures for agricultural research. 2nd. edn. New York, Wiley.
Hampel FR, Ronchetti EM, Bousseeuw PJ, Stahel WA. 1986. Robust statistics: the approach based on influence functions. New York, Wiley .
Harwell MR, Rubinstein EN, Hayes WS, Olds CC. 1992. Summarizing Monte Carlo results in methodological research: the one- and two-factor fixed effects ANOVA cases. Journal of Educational Statistics 17: 335-339.
Jaeger TF. 2008. Categorical data analysis : Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language 59: 434-446.
Kempthorne O. 1952. Design and analysis of experiments. New York, Wiley .
Levine DW, Dunlap WP. 1983. Data transformation, power, and skew: a rejoinder to games. Psychological Bulletin 93: 596-599.
Little TM, Hills FJ. 1978. Agricultural experimentations: Design and analysis. New York, Wiley .
McCulloch CE, Searle SR, Neuhaus JM. 2008. Generalized, Linear, and Mixed Models, 2nd. edn. New Jersey, Willey.
Montgomery DC. 2000. Design and analysis of experiments . 5th. edn. New York, John Wiley & Sons.
Mora F, Gonçalves LM, Scapim CA, Martins E L, Machado MFPS. 2008. Generalized Lineal Models for the analysis of binary data from propagation experiments of Brazilian orchids. Brazilian archives of Biology and Technology 51: 963-970.
Nelder JA, Wedderburn RWM. 1972. Generalized Linear Models. Journal of the Royal Statistical Society 135: 370-384.
Oliveira RS, Galvão HC, Campos MCR, Eller CB, Pearse SJ, Lambers H. 2015. Mineral nutrition of campos rupestres plant species on contrasting nutrient-impoverished soil types. New Phytologist 205: 1183-1194.
Osborne JW. 2010. Improving your data transformations: applying the Box-Cox transformation. Practical Assessment, Research & Evaluation 15: 1-9.
Perea R, Venturas M, Gil L. 2013. Empty seeds are not always bad: simultaneous effect of seed emptiness and masting on animal seed predation. Plos One 8(6): e65573. doi: https://doi.org/10.1371/journal.pone.0065573
» https://doi.org/https://doi.org/10.1371/journal.pone.0065573
Pimentel-Gomes F. 2000. Curso de estatística experimental. 14th. edn. Piracicaba, Nobel.
Rasmussen JL. 1985. Data transformation maximizing homocedasticity and whitin-group normality. Behavior Research Methods, Instruments & Computers 17: 411-412.
Sakia RM. 1992. The Box-Cox transformation technique: a review. Journal of the Royal Statistical Society 41: 169-178.
Semir J. 1991. Revisão taxonômica de Lychnophora Mart. (Veroniaceae: Compositae). PhD Thesis, Universidade Estadual de Campinas, Campinas.
Sharpe K. 1970. Robustness of Normal Tolerance Intervals. Biometrika 57: 71-78.
Sileshi GW. 2007. Evaluation of statistical procedures for efficient analysis of insect, disease and weed abundance and incidence data. East African Journal of Science 1: 1-9.
Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159.
Sokal RR, Rohlf FJ. 1995. Biometry: the principles and practice of statistics in biological research. 3th. edn. New York, W. H. Freeman.
Steel RGD, Torrie JH. 1980. Principles and procedures of statistics. New York, McGraw-Hill.
Tonetti OAO, Davide AC, Silva EAA. 2006. Physical and physiological quality of Eremanthus erythropappus (DC.) Mac. Leish. Revista Brasileira de Sementes 28: 114-121.
Valcu M, Valcu CM. 2011. Data transformation practices in biomedical sciences. Nature Methods 8: 104-105.
Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10.
Wilcox RR. 1998. How many discoveries have been lost by ignoring modern statistical methods? American Psychologist 53: 300-314.
Wilson K, Hardy ICW. 2002. Statistical analysis of sex ratios: an introduction. In: Hardy ICW. (ed.) Sex ratios: concepts and research methods. Cambridge, Cambridge University Press. p. 49-92.
Zhao L, Chen Y, Schaffner DW. 2001. Comparison of logistic regression and linear regression in modeling percentage data. Applied and Environmental Microbiology 67: 2129-2135.

Publication Dates

Publication in this collection
15 Feb 2018
Date of issue
Apr-Jun 2018

History

Received
14 Sept 2017
Accepted
18 Dec 2017

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] Agresti A. 2002. Categorical data analysis. 2nd. edn. New York, John Wiley and Sons Press.

[2] Ahmad WMAW, Naing NN, Rosli N. 2006. An approached of Box-Cox data transformation to biostatistics experiment. Statistika 6: 1-6.

[3] Ahrens WH, Cox DJ, Budhwar G. 1990. Use of the arcsine and square root transformations for subjectively determined percentage data. Weed Science 38: 452-458.

[4] Araújo GLD. 2012. Métodos de estimação em regressão logística com efeito aleatório: aplicação em germinação de sementes. PhD Thesis, Universidade Federal de Viçosa, Viçosa.

[5] Austin MP. 1987. Models for the analysis of species response to environmental gradient. Vegetation 69: 35-45.

[6] Benites VM, Caiafa AN, Mendonça ES, Schaefer CE, Ker JC. 2003. Solo e vegetação nos complexos rupestres de altitude da Mantiqueira e do Espinhaço. Floresta e Ambiente 10: 76-85.

[7] Biondini ME, Mielke Jr PW, Berry KJ. 1988. Data-dependent permutation techniques for the analysis of ecological data. Vegetation 75: 161-168.

[8] Box GEP. 1953. Non-Normality and tests on variances. Biometrika 40: 318-335.

[9] Bradley JY. 1978. Robustness? British Journal of Mathematical and Statistical Psychology 31: 144-152.

[10] Dayrell RLC, Garcia QS, Negreiros D, Baskin CC, Baskin JM, Silveira FAO. 2016. Phylogeny strongly drives seed dormancy and quality in a climatically buffered hotspot for plant endemism. Annals of Botany 119: 267-277.

[11] Driscoll WC. 1996. Robustness of the ANOVA and Tukey-Kramer statistical tests. Computers & Industrial Engineering 31: 265-268.

[12] Faraway JJ. 2006. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. New York, Chapman and Hall.

[13] Fernandez GCJ. 1992. Residual analysis and data transformations: Important tools in statistical analysis. HortScience 27: 297-300.

[14] Fisher RA. 1925. Statistical methods for research workers. London, Oliver & Boyd.

[15] Fisher RA. 1934. Two new properties of mathematical likelihood. Proceedings of the Royal Society A 144: 285-307.

[16] Glass GY, Peckham PD, Sanders JR. 1972. Consequences of failure to meet the assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research 42: 237-288.

[17] Gomez KA, Gomez AA. 1984. Statistical procedures for agricultural research. 2nd. edn. New York, Wiley.

[18] Hampel FR, Ronchetti EM, Bousseeuw PJ, Stahel WA. 1986. Robust statistics: the approach based on influence functions. New York, Wiley .

[19] Harwell MR, Rubinstein EN, Hayes WS, Olds CC. 1992. Summarizing Monte Carlo results in methodological research: the one- and two-factor fixed effects ANOVA cases. Journal of Educational Statistics 17: 335-339.

[20] Jaeger TF. 2008. Categorical data analysis : Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language 59: 434-446.

[21] Kempthorne O. 1952. Design and analysis of experiments. New York, Wiley .

[22] Levine DW, Dunlap WP. 1983. Data transformation, power, and skew: a rejoinder to games. Psychological Bulletin 93: 596-599.

[23] Little TM, Hills FJ. 1978. Agricultural experimentations: Design and analysis. New York, Wiley .

[24] McCulloch CE, Searle SR, Neuhaus JM. 2008. Generalized, Linear, and Mixed Models, 2nd. edn. New Jersey, Willey.

[25] Montgomery DC. 2000. Design and analysis of experiments . 5th. edn. New York, John Wiley & Sons.

[26] Mora F, Gonçalves LM, Scapim CA, Martins E L, Machado MFPS. 2008. Generalized Lineal Models for the analysis of binary data from propagation experiments of Brazilian orchids. Brazilian archives of Biology and Technology 51: 963-970.

[27] Nelder JA, Wedderburn RWM. 1972. Generalized Linear Models. Journal of the Royal Statistical Society 135: 370-384.

[28] Oliveira RS, Galvão HC, Campos MCR, Eller CB, Pearse SJ, Lambers H. 2015. Mineral nutrition of campos rupestres plant species on contrasting nutrient-impoverished soil types. New Phytologist 205: 1183-1194.

[29] Osborne JW. 2010. Improving your data transformations: applying the Box-Cox transformation. Practical Assessment, Research & Evaluation 15: 1-9.

[30] Perea R, Venturas M, Gil L. 2013. Empty seeds are not always bad: simultaneous effect of seed emptiness and masting on animal seed predation. Plos One 8(6): e65573. doi: https://doi.org/10.1371/journal.pone.0065573
» https://doi.org/https://doi.org/10.1371/journal.pone.0065573

[31] Pimentel-Gomes F. 2000. Curso de estatística experimental. 14th. edn. Piracicaba, Nobel.

[32] Rasmussen JL. 1985. Data transformation maximizing homocedasticity and whitin-group normality. Behavior Research Methods, Instruments & Computers 17: 411-412.

[33] Sakia RM. 1992. The Box-Cox transformation technique: a review. Journal of the Royal Statistical Society 41: 169-178.

[34] Semir J. 1991. Revisão taxonômica de Lychnophora Mart. (Veroniaceae: Compositae). PhD Thesis, Universidade Estadual de Campinas, Campinas.

[35] Sharpe K. 1970. Robustness of Normal Tolerance Intervals. Biometrika 57: 71-78.

[36] Sileshi GW. 2007. Evaluation of statistical procedures for efficient analysis of insect, disease and weed abundance and incidence data. East African Journal of Science 1: 1-9.

[37] Sileshi GW. 2012. A critique of current trends in the statistical analysis of seed germination and viability data. Seed Science Research 22: 145-159.

[38] Sokal RR, Rohlf FJ. 1995. Biometry: the principles and practice of statistics in biological research. 3th. edn. New York, W. H. Freeman.

[39] Steel RGD, Torrie JH. 1980. Principles and procedures of statistics. New York, McGraw-Hill.

[40] Tonetti OAO, Davide AC, Silva EAA. 2006. Physical and physiological quality of Eremanthus erythropappus (DC.) Mac. Leish. Revista Brasileira de Sementes 28: 114-121.

[41] Valcu M, Valcu CM. 2011. Data transformation practices in biomedical sciences. Nature Methods 8: 104-105.

[42] Warton D, Hui F. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92: 3-10.

[43] Wilcox RR. 1998. How many discoveries have been lost by ignoring modern statistical methods? American Psychologist 53: 300-314.

[44] Wilson K, Hardy ICW. 2002. Statistical analysis of sex ratios: an introduction. In: Hardy ICW. (ed.) Sex ratios: concepts and research methods. Cambridge, Cambridge University Press. p. 49-92.

[45] Zhao L, Chen Y, Schaffner DW. 2001. Comparison of logistic regression and linear regression in modeling percentage data. Applied and Environmental Microbiology 67: 2129-2135.

Ind.	G (%)	Seeds without embryo	RG (%)	Ind.	G (%)	Seeds without embryo	RG (%)
1	0.0	81.5	0.0	11	15.0	79.5	72.4
2	2.5	88.0	15.7	12	0.0	71.5	0.0
3	1.5	68.0	6.3	13	14.5	73.0	52.4
4	1.5	82.0	9.4	14	1.0	65.5	3.6
5	1.5	74.0	4.7	15	0.0	81.0	0.0
6	9.5	77.0	41.8	16	6.5	83.5	45.3
7	12.0	85.5	78.1	17	2.0	77.5	7.7
8	4.0	80.5	18.8	18	6.0	78.0	30.8
9	2.0	79.0	8.3	19	0.0	90.0	0.0
10	1.5	80.5	13.6	20	9.0	85.0	66.9

	Kolmogorov-Smirnov test		Levene test
	K-S (Prob.)	Normal Residuals	F (Prob.)	Homogeneous variance
G
Original scale	0.269 (0.000)	No	2.856 (0.001)	No
$A r c s i n e \sqrt{\frac{x}{100}}$	0.312 (0.000)	No	2.895 (0.001)	No
RG
Original scale	0.276 (0.000)	No	5.468 (0.000)	No
$A r c s i n e \sqrt{\frac{x}{100}}$	0.295 (0.000)	No	4.841 (0.000)	No

ANOVA
*Germination (G)*
Source	df	SS	MS	F statistic	P-value
Individual	19	1856.0	97.684	7.438	0.000
Residual	60	788.0	13.133
CV=80.5%
*Relative germination (RG)*
Source	df	SS	MS			F statistic	P-value
Individual	19	52253.157	2750.166			9.534	0.000
Residual	60	17307.878	288.465
CV=71.4%
Kruskal-Wallis test
	Df	H statistic				P-value
Germination (G)	19	55.340				0.000
Relative germination (RG)	19	58.773		0.000

Scott-Knott test				Dunn test
Individual	G (%)	Individual	RG (%)	Individual	G (%)	RG (%)
6, 7, 11, 13, 20	9-15 a	7, 11, 20	66.9-78.1 a	1-20	0-15 a	0-81.3 a
1-5, 8-10, 12, 14-19	≤ 6.5 b	6, 13, 16	41.8- 52.4 b
1-5, 8-10, 12, 14-19	≤ 6.5 b	1- 5, 8-10, 12, 14, 15, 17- 19	≤ 30.8 c

Generalized Linear Model - normal distribution - identity
	G (%)
Source	Difference of df	Difference of deviance	F	P-value
Individual	19	1856.0	7.438	< 0.001
Residual	60	788.0
D²		70.2%
	*RG (%)*
Individual	19	52253.16	9.534	< 0.001
Residual	60	17307.88
D²		75.1%
	Generalized Linear Model - binomial Distribution - logit
	*G (%)*
Source	Difference of df	Difference of deviance		P-value
Individual	19	213.77		< 0.001
Residual	60	99.74
D²		64.5%
	RG (%)
Individual	19	300.31		< 0.001
Residual	60	104.27
D²		74.2%