Assessing transformation methods for group comparisons under violated assumptions: type I error rate and test power

ABSTRACT In this study, some transformation methods that are applied when the assumptions of analysis of variance are not met are evaluated in terms of type I error rate and the test power, under circumstances with different distributions, number of groups, number of observations, variance ratios, and different standard deviation differences. The data set used in the study consisted of random numbers generated from N (0,1), and χ2(3) distributions using the random function of the Numpy library in the Python programming language. The logarithmic, square root and root transformations were evaluated on ANOVA based on simulation combinations. It was observed that the transformation techniques of taking the square root after adding 0.5 and 0.375 to the data were relatively more reliable compared to other transformations in terms of type I error rate. However, in every case, type I error rate determined at the beginning of the experiment increased both before and after the transformation was applied. In particular, interestingly, the third and fourth degree root transformations gave better results of test power in the right skewed distribution. In addition, we compared the transformation techniques in question to determine the normality of the data and the homogeneity of variances by a real data.


INTRODUCTION
Most studies examining the effects of any treatment on the means of the groups consider three or more groups.The analysis of variance (ANOVA-F) test is still widely used today as a parametric test method for comparing the mean of more than two groups.Some assumptions must be met before conducting the parametric tests.The assumptions for ANOVA are independence of observations, additivity of factor effects, homogeneity of variances between or among groups, and normality of the data.The normality of the observations and the homogeneity of the group variances are related to the assumed populations; hence the researcher cannot always interfere with these assumptions.Therefore, if these assumptions are not met, the results of the ANOVA are invalid (Larson, 2008;Mendes, 2012).
Applying the ANOVA without meeting the assumptions causes a deviation from the predetermined type I error rate (5.0%), thus affecting the test power.Consequently, the true differences between the means of the groups may not be revealed.After checking the assumptions with conventional approaches, there are some alternative options if the assumptions of ANOVA are not met.In this sense, Tukey (1957) suggested that if the assumptions of ANOVA are not met, transformation techniques can be used on the questionable data.Some studies have examined the type I error rate and the test power in comparing the mean of more than two groups using parametric and nonparametric tests (Mendeş, 2002;Patric, 2007;Koşkan and Gürbüz, 2009;Ferreira et al., 2012;Lantz, 2013).
In general, quantitative data would have the normal distribution, but in practice, the data may not always have a normal distribution thus may not satisfy the assumptions that observations would be normally distributed and variances would be homogenous.
Data transformation, which is one of the options that can be applied in this case, provides a new form to the questionable data by using a variety of mathematical operations.Some researchers claim that "transforming data is an inappropriate way or data cheating".The missing part in this critique is that these transformations are applied to all data, not just a part of it, so there is no cheating or voluntary manipulation.Furthermore, the data transformation technique ensures the validity of the statistical test.In the literature, there are many simulation studies investigating the effects of transformation techniques on the ANOVA in terms of type I error rate and test power.Some of these studies reported that various transformation techniques had negative effects on type I error rate and test power (Arıcı et al., 2011), while others reported positive effects (Mahapoonyanont et al., 2010;Özkan et al., 2010;Arıcı, 2012;Yiğit, 2012).Maidapwad and Sananse (2014) emphasized that many researchers start conducting variance analysis without checking the normality assumption, which leads to information loss in the obtained results.To support this claim, they demonstrated the effects of various transformation techniques on group comparisons.Hammouri et al. (2020) mentioned the positive effects of conducting group comparisons after logarithmic transformation of data with skewed distributions.This study is one of the recent significant works in this field.
As highlighted by Blanca et al. (2017), if the distribution shapes of the assumed populations exhibit moderate deviations from normality, the assumption of same population distribution shapes holds, each group has equal sample size, and the sample size is large, then the technique of analysis of variance (ANOVA) is a powerful method.However, researchers may sometimes have doubts about which sample size is sufficient or how much deviation from normality can be tolerated.
There are various methods that can be an alternative to the ANOVA technique when assumptions were not met.Generally, researchers use non-parametric methods such as the Kruskal-Wallis test when the data could not meet the normality assumption, in addition to transforming the data.However, the Kruskal-Wallis test is also heavily influenced by heterogeneity of variances (Liu, 2015).
The purpose of this study was to analyze the effects of different transformation techniques, including logarithmic ( ), square root ( , and ) and root transformations ( and ), on one-way variance analysis.The focus will be on assessing both type I error rates and test powers in situations where the assumptions for normal distribution and homogeneity of variances are not met.

MATERIALS AND METHODS
The data set of this study consisted of randomly generated numbers from N (0, 1) and χ2 (3) distributions, determined according to the simulation design given in Table 1.Numpy library in Python Programming Language for generating random numbers was used (Harris et al., 2020).Density plots of the theoretical distributions used are shown in Figure 1.We also compared transformed and non-transformed datasets for normality and homogeneity of variances on real data.The Shapiro-Wilks and Bartlett tests were conducted on real data to assess the normality and homogeneity of variances, respectively.Detailed information about the real dataset will be explained in later sections.

METHODS
Simulation designs were set up for each distribution N (0,1) and χ2(3) as follows: the numbers of the group were determined as 3, and the number of observations in each group as 3, 5, 10, 15, and 30.In addition, variance ratios among the groups were adjusted as 1, 3, 5, and 10 folds the variance ratio of the other groups.
The standard deviation differences among the means were generated as 0, 0.5, 1, 1.5, and 2. Each combination of simulation was iterated 100000 times.Due to the populations having different means and variances, each observation was standardized.Thus, the means and variances of all populations were equalized.Samples were generated from the standardized populations according to determined sample sizes.If the type I error rate was the focus, and the variances were homogeneous, the observations were used as they are.However, in the case when the variances became heterogeneous, the observations in the final group were multiplied by the square roots of the constant numbers corresponding to the specified variance ratios.In addition, if the power of the test was the focus, standard deviation differences were constituted by adding constant numbers to the final group.The determination of whether the differences among the group means were due to coincidence Arq.Bras.Med.Vet.Zootec., v.75, n.5, p.883-892, 2023 or not was provided by a one-way ANOVA technique.In the ANOVA technique, the type I error rate was calculated by dividing the number of the rejected H 0 hypotheses in 100000 simulations, before and after the transformations were applied to the observations, by the total number of simulations.For the power of the test, standard deviation differences were constituted, and after 100000 simulations and the number of rejected H 0 hypotheses before and after transforming was divided by the total number of simulations.The nominal significance level (α) was determined as 5.0% in this simulation study.A flowchart representing the simulation program utilized to compute type I error rate and test power is shown in Figure 2.

Figure 2. Flowchart of simulation program
It is well known that the analysis of variance is the frequently used statistical method to determine whether the difference between the means of two or more independent groups is due to coincidence or not.ANOVA or in other words F test is used to test H 0 (null) and H A (alternative) hypotheses as described below in detail.The data generated by simulation can be identified with Equation (1).
(1) where; : is the overall mean of the population, : i th the effect of the treatment, : is the error term.
The null and alternative hypotheses can be tested as: where k is the number of experimental groups or treatments.
The F ratio is calculated by dividing the mean square of treatments (MST) by the mean square of error (MSE).The critical Ftable value is determined with k -1 and Nk degrees of freedom.
If the calculated F = ratio is greater than the critical Ftable value, then H 0 is rejected.The H 0 hypothesis is accepted when the calculated F = ratio is lower than the critical Ftable value.

RESULTS
Simulation results of the type I error rates of ANOVA after transformations when the distributions are normal and χ 2 (3) shown in Table 2.For standard normal distribution, the type I error rates calculated without transformations were kept at 5% when variances were homogeneous, regardless of the sample size.It was observed that the calculated type I error rates tended to increase when the variances were slightly nonhomogenous, and the sample size increased.As the sample size increased, this trend became more apparent.For instance, when variance ratios were 1:1:5, the type I error rate after square root transformation was 6.9 and 7.7% for n=3 and n=30, respectively.It was found that, as the variance heterogeneity increased, the type I error rates calculated without transformation outperformed those calculated with transformation but did not maintain the pre-determined type I error rate (5.0%).In addition, and transformation techniques were more reliable than other transformation techniques in case variances were heterogeneous.It can be concluded that the type I error rates increase after transformation techniques when the homogeneity of variance is severely disrupted at the rate of 1:1:1:1:10 with increasing sample size.The normal font states type I error rates for standard normal distribution while bold font states type I error rates for χ 2 (3) distribution.
For χ 2 (3) distribution, an increase in the sample sizes resulted in a 5.0% type I error rate in a scenario where the variances were homogeneous.All transformation techniques produced results that were in proximity to the pre-determined type I error rate (5.0%).Although the application of any transformation technique increased the type I error rate, it was found that logarithmic transformation produced a lower type I error rate, especially when n=50 and variances were heterogeneous.This trend was consistently observed across all heterogeneous variance ratios.Furthermore, when the variances were homogenous, it was observed that the type I error Arq.Bras.Med.Vet.Zootec., v.75, n.5, p.883-892, 2023 rates approached 5.0% in non-transformed data.
It is seen that while the variances were homogeneous, all transformation techniques increased the type I error rates to 5%.As the heterogeneity of the variances increased, the type I error rates in ANOVA could not be kept at the level of %5.0 after all transformation techniques were applied, regardless of the sample size.
The power values of ANOVA for both transformed and non-transformed data and observations were obtained from both distributions are presented in Tables 3 and 4. The power values that reached the desired level of 80% are presented in bold font according to standard deviation differences ranging from 0.5 to 2. In the case of standard normal distribution when variances were homogeneous, power values above 80% were achieved with a small sample size, however, when variances were heterogeneous this could only be attained with a larger sample size.Also, there was no significant difference between transformed and nontransformed values.Under the χ 2 (3) distribution, the power values obtained with the and transformations were higher compared to other transformation methods, especially when the variances were homogeneous and the standard deviation differences ranged between 0.5 to 2. In cases where the variance ratios were 1:1:3, the and transformations were also more successful, particularly in low standard deviation differences and small sample sizes (such as 30).In addition, similar results were obtained when the variances became increasingly heterogeneous, for example, in cases where the variance ratios were 1:1:5 and 1:1:10.Moreover, all applied transformation techniques reached or exceeded the desired power level of 80%.Among the transformation techniques, the and techniques were more powerful than the rest under a χ 2 (3) distribution.The power values that reached or exceeded the desired power level of 80% are indicated in bold font for χ 2 (3) distribution.

DISCUSSION
When the variances were homogenous, the type I error rates with non-transformed and transformed data preserved the pre-determined value of 5% in ANOVA.This result agreed with Başpınar and Gürbüz (2000), Arıcı et al. (2011), Arıcı (2012), Yiğit (2012), and Blanca et al. (2017) who found that when the variances were homogeneous the type I error rates preserved at 5%.When the variances were heterogeneous, it was observed that the type I error rates for ANOVA with transformed and non-transformed data could not preserve the pre-determined value of 5%.Furthermore, the type I error rates tended to increase with transformed data when the sample size was 30 or larger.In a simulation study with data having a normal distribution, Arıcı (2012) reported that square root and logarithmic transformations increased the pre-determined (5%) type I error rate.Hence, the increase of the type I error rate when the variances deviate from homogeneity is consistent with the findings of Trumbo et al.In addition, under χ 2 (3) distribution, the type I error rates of transformations applied data did not preserve the pre-determined value of 5%.Tekindal (1999) reported that under χ 2 (3) distribution the type I error rates in variance analysis were maintained at 5% after logarithmic and square root transformations were applied on the data.Therefore, our study is not consistent with the findings of Tekindal (1999).It was observed that when the variances were heterogeneous, the type I error rates could not be maintained at the 5.0% level and received higher values due to the application of transformation techniques.Yiğit (2012) reported that logarithmic transformation did not provide reliable results when variances were heterogeneous.These findings are consistent with the results obtained from this study.
After applying transformations to skewed data, variance analysis yielded more powerful results compared to the non-transformed data.In the case of the rightly skewed χ 2 (3) distribution, the test power values increased after transformations, particularly square root transformations, as the heterogeneity of the variances increased.In this context, the findings of Rasmussen and Dunlap (1991) and Çavuş and Yazıcı (2020) studies share similarities with the present study.
It was stated that applying logarithmic, square root, and root transformation techniques in the study resulted in similar increases in the power values after performing ANOVA on the non-transformed data.When applying a logarithmic transformation, the test power increased with an increase in sample size.In this respect, these results are similar to Trumbo et al. (2004) and Mahapoonyanont et al. (2010) studies.
When the standard deviation difference was 0.5 and variances were heterogeneous, the test power decreased in both cases, with and without transformations.Arıcı (2012) claimed different results for this situation who reported that the test power values were adversely affected when the standard deviation differences were 1 and 1.5.It was evident that after applying transformation methods, there was an increase in the test power values with an increase in heterogeneity levels and the number of observations between populations (Arıcı, 2012).This finding is also consistent with the current study.
Table 3. Test power values when the distributions are standard normal distribution and χ 2 (3), and the variance ratios are 1:1:1 and 1:1:3  The first value in each cell of the table belongs to the standard normal distribution, while the second value belongs to the χ 2 (3) distribution.
Arq. Bras.Med.Vet.Zootec., v.75, n.5, p.883-892, 2023 Table 4. Test power values when the distributions are standard normal distribution and χ 2 (3), and the variance ratios are 1:1:5 and 1:1:10 The first value in each cell of the table belongs to the standard normal distribution, while the second value belongs to the χ 2 (3) distribution.
The Shapiro-Wilk and Bartlett tests were employed on the real data to determine the normality and homogeneity of variances, respectively.The open-access dataset used in this study was published in the Science Data Bank by Bousbia et al. (2021).The data included body measurements taken from cattle in Algeria, with a total of 130 adult cattle (30 males and 100 females) from 30 farms belonging to 4 region-specific ecotypes with distinct characteristics being measured.We used only one variable, which was Muzzle Circumference (MC) to assess the normality and homogeneity of variances, both on transformed and nontransformed data.The Shapiro-Wilk and Bartlett tests results are demonstrated in Table 5. Hypothesis for Shapiro Wilk and Bartlett test can be described basically as follows: H 0 : The data is normally distributed.H A : The data is not normally distributed.H 0 : The assumed population variances of the groups from which they are taken are equal.H A : The assumed population variances of at least two (maybe all) groups from which they are taken are not equal.If the p-value is greater than the nominal significance level of 0.05, it means that the null hypothesis will not be rejected.
Based on the Shapiro-Wilk test as tabulated in Table 5; the MC variable fitted the normal distribution after transformations since the p-value was greater than the nominal significance level (α = 0.05).The and transformations gave better results than others in terms of p-value.
When considering the results of the Bartlett test, the probability of accepting the null hypothesis significantly increased after all transformations.Thus, the homogeneity of variances, which is one of the most important assumptions of ANOVA, was met.Similar to the assumption of normal distribution, the and transformations yielded improved results.Especially in the case of right-skewed distributions such as χ 2 (3), it was observed that and transformations provided significantly higher test power values.While current transformation techniques are relatively effective under specific conditions, they can be ineffective in many cases, thus highlighting the need for new transformation techniques.The necessity of modifying and improving current transformation techniques is one of the conclusions of this study.Based on the information provided above the effect of the transformation techniques evaluated in this study can be examined with different sample sizes or samples obtained from different continuous distributions.

Table 2 .
Type I error rates of ANOVA after transformations when distribution is standard normal N ( ,1) and χ2(3)

Table 5 .
Tests for normality and homogeneity of the variances for MC variable