Non-parametric tests for small samples of categorized variables: a study

Contador, José Luiz; Senne, Edson Luiz França

doi:10.1590/0104-530X357-15

Abstract

This paper presents a study on non-parametric tests to verify the similarity between two small samples of variables classified into multiple categories. The study shows that the only tests available for this situation are the chi-square and the exact tests. However, asymptotic tests, such as the chi-square, may not work well for small samples, leaving exact tests as the alternative. Nevertheless, if the number of classes increases, the implementation of these tests can become very difficult, in addition to requiring specific algorithms that may demand considerable computational effort. Therefore, as an alternative to the exact tests, a new test based on the difference between two uniform distributions is proposed. Computational assays are conducted to evaluate the performance of these three tests. Although non-parametric tests present numerous applications in various areas of knowledge, this study was motivated by the need to verify whether the business strategy adopted by a company is a determining factor for its competitiveness.

Keywords:
Non-parametric tests; Small samples; Computer simulation; Competitive strategy

Resumo

Apresenta-se neste trabalho um estudo sobre testes não paramétricos para verificar a semelhança entre duas pequenas amostras de variáveis classificadas em múltiplas categorias. Mostra-se que, para essa situação, os únicos testes disponíveis são qui-quadrado e os testes exatos. Porém, testes assintóticos (como o qui-quadrado) podem não funcionar bem para pequenas amostras, sobrando como alterativa a aplicação de testes exatos. Mas, se o número de categorias cresce, a aplicação desses testes pode-se tornar bastante difícil, além de requerer algoritmos específicos, que podem exigir grande esforço computacional. Assim, um novo teste baseado na diferença de duas distribuições uniformes é proposto como uma alternativa ao teste exato. Ensaios computacionais são realizados para avaliar o desempenho desses três testes. Embora testes não paramétricos tenham inúmeras aplicações em diversas áreas de conhecimento, este trabalho surgiu motivado pela necessidade de verificar se a estratégia de negócio adotada pela empresa é um fator determinante para sua competitividade.

Palavras-chave:
Testes não paramétricos; Pequenas amostras; Simulação computacional; Estratégia competitiva

1 Introduction

This work was motivated by a need to create an easily applied statistical test to aid research based on the development of the Fields and Weapons of Competition (FWC) model (Contador, 2008Contador, J. C. (2008). Campos e armas da competição: novo modelo de estratégia. São Paulo: Sant Paul.) to gauge (among other things) whether the business strategy adopted by a company is a determining factor of its competitiveness. In his research, the author of this model collected a small sample of companies and divided them into two groups. One group was formed by the most competitive companies and the other by the least competitive. The test is used to determine whether both groups adopt similar business strategies (null hypothesis H₀).

The proposed test can be used for any problem with the following characteristics:

a
Two different groups, I and II (for example, more competitive and less competitive companies), representing samples of larger populations, with n₁ and n₂ elements in each group, where n₁ and n₂ are small values;
b
For each group or sample, the random variable assumes values of frequencies in each of the m classes, m>2 (see Table 1), i.e., the random variable is measured on a nominal scale or categorized with more than two categories

Thumbnail

Table 1
Frequencies of strategies (FC) for the groups of companies.
c
The number of classes or categories that the random variable may assume (value of m) is moderate in relation to the n₁ and n₂ values

It should be noted that if the random variable could be classified into only two categories (e.g., two strategies), the problem could be easily solved by Fisher’s exact test (see Section 4), whatever the size of n₁ and n₂ of the samples from the two groups.

If, on the other hand, there were more than two categories for the random variable, but for each class a sufficiently large number of individuals (which would generate a problem with large samples), it would also be easy to determine the similarity between the two sets of responses using the chi-square text, which can fail when small samples are involved.

The other non-parametric tests that are available (sign test, Wilcoxon signal rank test, rank sum test, median test and t-test for paired dataset) are inadequate, as will be demonstrated through examples. Thus, for the case of small samples and more than two classes for the random variable, the problem is difficult to solve.

Therefore, the only safe alternative for addressing this type of problem is exact tests, such as the one presented in StatXact (2008)StatXact. (2008). Software for small-sample categorical and nonparametric data. Cambridge. Recuperado em 01 de dezembro de 2008, de http://www.cytel.com/products/statxact/
http://www.cytel.com/products/statxact/... , with the solution based on an extension of Fisher’s Exact Test (Fisher, 1970Fisher, R. A. (1970). Statistical methods for research workers. 14. ed. Edinburgh: Oliver and Boyd.) proposed by Freeman & Halton (1951)Freeman, G. H., & Halton, J. H. (1951). Note on an exact treatment of contingency goodness-of-fit and other problems of significance. Biometrika, 38(1-2), 141-149. http://dx.doi.org/10.1093/biomet/38.1-2.141. PMid:14848119.
http://dx.doi.org/10.1093/biomet/38.1-2.... . However, the implementation of this test requires specific algorithms and, in some cases, requires considerable computational effort, which justifies the search for new tests for this type of problem.

In light of this, the present article presents a comparative performance study (capacity to decide H₀ correctly) of the exact tests, chi-square and a new test based on the difference between two uniform distributions, proposed here. The effectiveness of these tests is compared using three indicators (risks α and β and the characteristic indicator, CI, extracted from the power curve, which will be constructed through simulation.

The studies developed here focus on attempting to solve the problem of strategy related to the FWC model. For this reason, some concepts of this model are given in the following section, as they are essential for understanding the problem in question. The aim of this article is not to discuss or introduce the FWC model. If the reader would like to know more about the model, a source of further information is provided in the references.

Numerous other problems related to biology, medicine and the social and human sciences have the characteristics described above and could be addressed using the statistical techniques used here. Some examples of problems directly related to social engineering are:

− Determining whether two different types of employees (machine operators and office workers, for example) in small companies (with few workers) have similar motivations in order to develop a single incentives program (or include all workers in a single program);
− Determining, through a small sample of companies from different sectors (e.g., manufacturing and services) whether these companies value the same characteristics in their executives to standardize human development programs;
− Determining whether executives (few in number) from different business units of a corporation have similar managerial capacity;
− Determining whether two different production processes, by analyzing few parts, create products with similar levels of quality for different characteristics (size, finishing, etc.).

The main result of the work was that effectiveness of the proposed test was similar to that of exact tests and that it performs well in situations in which the chi-square test fails (small samples and scanty, unbalanced data). Therefore, it is a real alternative to the exact test, the application of which often requires special software with restricted access.

In Section 3, there is a brief discussion on non-parametric tests and a critical analysis of their application to solve the problem in question (strategy). In Section 4, the solution adopted by the StatXact for problems with categorized variables is presented. In Section 5, the development of the proposed test is presented, based on the difference between two uniform distributions. In Section 6, the studies conducted to assess the performance of the three tests (the proposed test, the exact test and the chi-square) are presented. The conclusions are given in Section 7. This final section also shows how the proposed test can be extended for problems with more than two independent samples, and are presented two examples in which the proposed test shows a clear advantage over the chi-square.

2 Fields and weapons of competition model

According to the FWC model, companies focus their competitive strategy on one of the 14 fields of competition (clustered in five macro fields), although they can adopt another (two or three) supporting fields. The fields of competition, according to the FWC model, are as follows:

− Macro-field of competition in price: (1) the price itself, (2) payment conditions, and (3) prize and/or promotion;
− Macro-field of competition in product, goods or services: (4) product project, (5) product quality, and (6) variety of models;
− Macro-field of competition in attendance: (7) presales technological service, (8) assistance during sale, and (9) after-sales technical service;
− Macro-field of competition in delivery time: (10) deadline of budgeting and negotiation, and (11) product delivery deadline;
− Macro-field of competition in image: (12) product and brand name, (13) reliability of the company, and (14) social responsibility (civil and preservationist).

The test of the FWC model assumes that a company’s competitiveness is not determined by its choice of competitive strategy. Rather, it is the correct alignment of its core competence (Hamel & Prahalad, 1995Hamel, G., & Prahalad, C. K. (1995). Competindo pelo futuro. Rio de Janeiro: Campus.) with the chosen field of competition, whatever it may be. Evidently, the model assumes that it is necessary to choose for each product/market pair one of the fields that is of interest to the market.

For a better understanding of the problem in question, consider the data in Chart 1, extracted from one of the studies conducted by Contador (2008)Contador, J. C. (2008). Campos e armas da competição: novo modelo de estratégia. São Paulo: Sant Paul., with the set of 21 companies which, by degree of competitiveness (DC), were divided into two groups: the most and least competitive. To determine the degree of competitive of the company i ( $D C_{i}$ ), the FWC model normally uses the variation that occurs in a given period of time for invoicing or net turnover of the company.

Group I: Most competitive companies				Group II: Least competitive companies
Code	Main field of competition (FC)		DCi	Code	Main field of competition (FC)		DCi
Code	Denomination	FC	DCi	Code	Denomination	FC	DCi
E10	Product and brand image	A	1.51	E05	Variety of models	D	0.82
E13	Product delivery deadline	B	1.43	E11	After-sales service	C	0.80
E17	After-sales service	C	1.39	E06	Product and brand image	A	0.79
E19	After-sales service	C	1.32	E12	Product and brand image	A	0.79
E21	Variety of models	D	1.25	E04	Product and brand image	A	0.69
E02	Product and brand image	A	1.19	E14	Presales service	F	0.62
E08	Product project	E	1.16	E16	Product project	E	0.54
E03	After-sales service	C	1.14	E07	Product and brand image	A	0.47
E13	Product project	E	1.11	E09	Presales service	F	0.38
E01	Variety of models	D	1.07	E20	Product project	E	0.30
				E18	After-sales service	C	0.25

Evaluation Level	Business Units				Total
Evaluation Level	A	B	C	D	Total
High	5	2	2	0	9
Average	0	1	0	1	2
Low	0	2	3	4	9
Totals	5	5	5	5	20

RN in the interval	Variable C
[0, 1/6)	A
[1/6, 2/6)	B
[2/6, 3/6)	C
[3/6, 4/6)	D
[4/6, 5/6)	E
[5/6, 1]	F

Strategy (j)	$f_{j}$	$g_{j}$	$u_{j}$	*$v_{j}$*	*$\| u_{l}^{} - v_{j}^{} \|$*
A	2	3	0.167	0.250	0.083
B	1	0	0.083	0.000	0.083
C	3	3	0.250	0.250	0.000
D	2	1	0.167	0.083	0.083
E	2	2	0.167	0.167	0.000
F	2	3	0.167	0.250	0.083
Sum	12	12	1.000	1.000	0.333

	m
α	3	4	5	6	7	8
0.05	1.143	1.250	1.200	1.167	1.143	1.125
0.01	1.429	1.500	1.400	1.333	1.286	1.250

Teste	Probability of acceptance (Pa) - Percentage					CI and Risks (%)
Teste	DS=0	DS=0.2	DS=0.4	DS=0.6	DS=0.8	CI	α	β Average
Exact	98	94	80	24	0	5.6	2	50
Chi-Squ	95	89	73	16	0	5.9	5	45
Uniform	97	92	77	20	0	6.6	3	47

Parameter	m	Test
Parameter	m	Exact	Chi-square	Uniform
Risk α (%)	3	2.0	5.0	3.0
	4	3.0	4.0	4.0
	5	4.0	3.0	5.0
	6	10.0	8.0	9.0
	7	10.0	5.0	6.0
	8	5.0	3.0	9.0
	Average value	5.7	4.7	6.0
Risk β average (%)	3	49.5	44.5	47.3
	4	58.8	58.0	62.5
	5	51.8	55.0	54.8
	6	49.8	54.5	51.8
	7	49.0	53.3	53.8
	8	44.5	50.3	50.0
	Average value	50.5	52.6	53.3
CI	3	5.9	6.6	5.6
	4	2.3	1.3	2.3
	5	4.1	3.4	4.2
	6	3.5	2.9	4.3
	7	2.7	2.3	3.5
	8	5.8	3.9	4.9
	Average value	3.9	3.1	4.1

DS	Testes
DS	Exact	Chi-Squ	Uniform
0.00	566	572	564
0.20	44	39	51
0.40	158	137	141
0.60	410	390	366
0.80	575	572	562
All	1753	1710	1684

FC	j	$f_{j}$	$g_{j}$
A	1	2	4
B	2	1	0
C	3	3	2
D	4	2	1
E	5	2	2
F	6	0	2

Sample	j	1	2	3	4	5	6	7	8	9	10	11
A ₁	$f_{j}$	5	4	5	4	5	4	4	4	4	4	5
A ₂	$g_{j}$	2	1	2	1	2	1	5	5	4	5	5

Groups			Groups			Groups
I	II		I	II		I	II
6	3	9	7	2	9	8	1	9
2	6	8	1	7	8	0	8	8
8	9	17	8	9	17	8	9	17
(a)			(b)			(c)

5	2	2	0	9	4	3	2	0	9
0	0	0	2	2	1	0	0	1	2
0	3	3	3	9	0	2	3	4	9
5	5	5	5	20	5	5	5	5	20
(a)					(b)

Brasil

Brasil

Non-parametric tests for small samples of categorized variables: a study

Abstract

Resumo

1 Introduction

2 Fields and weapons of competition model

3 Non-parametric tests and the problem of similarity of strategies

4 Exact tests based on permutation theory

5 Test based on the difference between two uniform distributions

5.1 Determining the value of Dα

6 Study of the power of the tests

7 Analysis of the results and conclusions

Referências

Publication Dates

History