Does soybean sample size impact Tukey’s test for non-additivity?

ABSTRACT: This study analyzed the interference of sample size on Tukey’s test for non-additivity and found the sample size to optimize the test for soybean grain yield. Six experiments were conducted in a completely randomized block design with either 20 or 30 cultivars and three repetitions of each treatment. Grain yield was determined per plant, totaling 9,000 sampled plants. Next, sample scenarios up to 100 plants were simulated, estimating F statistic for a degree of freedom of the error in each scenario. After that, the optimal sample size was defined via power models and maximum curvature point. Results showed the number of sampled plants per experimental unit influences the estimates of Tukey’s test for non-additivity. Also, the sampling of 14 to 19 plants per experimental unit allows for maintaining the accuracy of the test.

Early research on the analysis of variance, including studies seen as classic references in the field, bring numerous discussions regarding alternative ways of verifying whether mathematical assumptions are met or violated (BARTLETT, 1947;COCHRAN, 1947;EISENHART, 1947), which is still the subject of more recent reports (WELHAM et al., 2015;BUTLER, 2021).TUKEY (1949) highlighted an assumption that is often forgotten in scientific research, such as the additivity of the mathematical model, proposing a methodology to measure it, named "Tukey's test for non-additivity".This tool has the purpose of separating a degree of freedom from the experimental error, which consisted of the interaction between the main factors of the analysis of variance, with subsequent application of the F test to identify the presence or absence of additivity in the mathematical model (TUKEY, 1949;BUTLER, 2021).The premise of this methodology is that dilation of a degree of freedom of the error is only possible when rows and columns, that is, the main effects, are not additive (TUKEY, 1949).Therefore, the assertiveness of the analysis of variance tends to be compromised, requiring either transformation or the use of non-Gaussian methodologies (BUTLER, 2021).
Exceptionally, this method is the one that gained greater visibility in identifying the additivity of analysis-of-variance models (ŠIMEČEK & ŠIMEČKOVA, 2013).However, the factors that possibly interfere with its estimates have not been deeply investigated yet.Soybean studies that use the ABSTRACT: This study analyzed the interference of sample size on Tukey's test for non-additivity and found the sample size to optimize the test for soybean grain yield.Six experiments were conducted in a completely randomized block design with either 20 or 30 cultivars and three repetitions of each treatment.Grain yield was determined per plant, totaling 9,000 sampled plants.Next, sample scenarios up to 100 plants were simulated, estimating F statistic for a degree of freedom of the error in each scenario.After that, the optimal sample size was defined via power models and maximum curvature point.Results showed the number of sampled plants per experimental unit influences the estimates of Tukey's test for non-additivity.Also, the sampling of 14 to 19 plants per experimental unit allows for maintaining the accuracy of the test.Key words: analysis of variance, experimental planning, Glycine max, mathematical assumptions.
CROP PRODUCTION analysis of variance (SOUZA et al., 2021;SODRÉ FILHO et al., 2022) use different sample sizes per experimental unit, with variations from 5 to 20 sampled plants.Nevertheless, SOUZA et al. (2022) showed a certain variation of the F statistic as a function of sample size for soybean, and such statistic is used in the methodology by TUKEY (1949).Thus, the number of sampled plants per experimental unit could be a factor affecting the bias of estimates from Tukey's test for non-additivity.On this basis, this study analyzed the inference of sample size in Tukey's test for non-additivity and found the sample size to optimize the test for soybean grain yield.
The statistical analyses were performed through specific routines built in the R environment (R DEVELOPMENT CORE TEAM, 2022).First, data were stratified per experimental unit in all experiments.Next, 31 sampling scenarios were planned (n = 1, 2, …, 20, 25, …, 50, 60, ..., 100 plants per experimental unit), so that, in each scenario, resamplings with reposition (bootstrap) were simulated 10,000 times (EFRON, 1979).This procedure was adopted using the sample() function.After that, for each resampling, a multiple linear regression was applied using the lm() function, considering grain yield per plant as a dependent variable and the effect of genotypes and blocks as independent variables.Each model was then squared, obtaining the square of the effect of the genotype × block interaction (λ Giβr ).Such an operation makes it possible to isolate a degree of freedom of the experimental error, as highlighted by TUKEY (1949).For the verification of the analysis-of-variance model additivity, the λ Giβr parameter was added to it and the analysis was performed using the aov() function.The following mathematical model was used: Y ir = m + G i + β r + ε ir + λ Giβr , where Y ir is the value observed in the response variable in plot ir, m is the overall mean, G i is the fixed effect of level i of the genotype factor, being i = 1, 2, ..., 30 for E1, E2 and E3 and i = 1, 2, ..., 20 for E4, E5 and E6, β r is the random effect of level r (r = 1, 2, 3) of the block, ɛ ir is the effect of the experimental error, and λ Giβr previously described.Afterwards, the F statistic value of the λ Giβr parameter with one degree of freedom was extracted.This statistic was calculated 1,860,000 times (31 sample sizes per experimental unit × 10,000 resamplings × 6 reference experiments).
The values extracted in each sampling scenario were subjected to descriptive analysis, calculating minimum, 2.5 percentiles, mean, 97.5 percentiles, and maximum values.The ninety-five percent confidence interval width (CI 95% ) was obtained as the difference between the 97.5 and 2.5 percentiles.Then, CI 95% estimates and the number of plants per experimental unit (planned sampling scenarios) were fitted through the nls() function with the following power model: CI 95% = a × n β + ε, where α is the coefficient of interception, n is the sample size, β is the exponential rate of decay, and ɛ is the error of random effect.In order to verify the fitting quality of the power model, the following quality indicators were used: coefficient of determination (R 2 ), root mean square error (RMSE), and Willmott's agreement index (d).Finally, such a model was considered in each experiment to apply four maximum-curvaturepoint methods (general, perpendicular distances, linear plateau response, and spline), described by SILVA & LIMA (2017), using the maxcurv() function from the soilphysics package, which were used to estimate the optimal sample size for Tukey's test for non-additivity.
The number of sampled plants per experimental unit interfered directly with the estimates of Tukey's methodology for non-additivity (Figure 1).This result brings insights into a poorly documented response of this test, that is, to sample size, which shows the expansion of a degree of freedom of the experimental error, as proposed by TUKEY (1949), is influenced by the number of samples used.Thus, a higher tendency of overestimating F results is observed in small sampling scenarios, such as when ≤ 3 plants are sampled.This estimate bias remains until the sampling of ≤ 8 plants, gradually reducing CI 95% and, consequently, providing more reliable estimates.SOUZA et al. ( 2022) also observed an exponential decreasing response when analyzing the response of CI 95% to the F test applied on the effect of genotypes, and other studies have also shown similar results for different statistics, such as in TOEBE et al. ( 2018) and BITTENCOURT et al.  (2022).In addition, the mean property of the F test for the model additivity is not constant, so smaller sample sizes show slightly higher F values that stabilize once the sample size is increased.Hence, this is an indicator that the precision of Tukey's test for non-additivity is improved in scenarios of greater samples, and thus, the test sensitivity to sample size.
In this sense, sample size determination was performed reliably, once the six parametrized power models showed a satisfactory fit (Table 1),    of sample size in this case.Importantly, although the recommendations obtained through the linear plateau response method reached 36 plants per experimental unit, the number of plants defined through the perpendicular distances' method is ≤ 19.Thus, the linear plateau response not only resulted in a considerably larger number of plants than the perpendicular distances' but also very little precision is gained if compared to the latter.BITTENCOURT et al. (2022) observed the same situation when defining sample size for the overall experimental mean in cauliflower seedlings, and SOUZA et al. (2022) used the perpendicular distances' method to estimate sample size for precision statistics in soybean.Such studies reinforce the use of the perpendicular distances' method, and for this, we recommend the sampling of ≥ 14 to ≤ 19 plants per experimental unit in order to optimize the estimates of Tukey's test for non-additivity, which will enable the accurate verification of the additivity assumption in analysis-of-variance models for soybean grain yield.However, the recommendations here made should not be followed without performing preliminary studies in experiments carried out in extremely different conditions than the ones here described, and should merely serve as a starting point for researchers that measure different traits in soybean.

Figure 1 -
Figure 1 -Minimum, 2.5 percentiles, mean, 97.5 percentiles, and maximum values for the first sowing date (E1 -a), second sowing date (E2 -b), and third sowing date (E3 -c) in Erval Seco, and first sowing date (E4 -d), second sowing date (E5 -e), and third sowing date (E6 -f) in Itaqui for F statistic of a degree of freedom of the experimental error, according to the methodology described by Tukey (1949) in soybean.
based on quality indicators.R 2 values were ≥ 0.95 and d ≥ 0.99, and RMSE did not exceed 0.92, where the least efficient model was the one parametrized for E4, although it still reached a high precision(WILLMOTT et al., 2012).The four methods for defining the maximum curvature points of each model obtained quite different sample sizes, with recommendations varying from ≥ 2 to ≤ 36 plants per experimental unit.As observed, such an oscillation in sample dimensioning depends on the technique used.BITTENCOURT et al. (2022), when using the same four maximum-curvature-point methods, also noted different results.The same authors verified smaller sample sizes were obtained through the general and spline methods than through the perpendicular distances and linear plateau response, which also occurred in this study (Figure2).The general method, for instance, presented sample size values varying from 2 to 5 plants per experimental unit between experiments, while the linear plateau response method suggested 16 to 36 plants.Based on the CI 95% , the values recommended by the general method (≤ 5 plants) may lead to biased results, once CI 95% has not reached stabilization yet with those sample sizes.The same situation is valid for the spline method (≤ 15 plants), especially as shown in figures 2a, 2b, and 2d.The perpendicular distances and linear plateau response methods show greater sample size results, which are closer to CI 95% stabilization point, and thus, more efficient in the dimensioning

Figure 2 -
Figure 2 -Sample size determination via power model and maximum curvature points for Tukey's test for non-additivity for the first sowing date (E1 -a), second sowing date (E2 -b), and third sowing date (E3 -c) in Erval Seco, and first sowing date (E4 -d), second sowing date (E5 -e), and third sowing date (E6 -f) in Itaqui.

Table 1 -
Coefficient of determination (R 2 ), root mean square error (RMSE), and d index of the power models, and maximum curvature points and sample sizes for Tukey's test for non-additivity.