Introduction
Cashew kernel is the principal and most important commercial product of cashew (Anacardium occidentale L.) tree, and is therefore essential to the selection process in projects for the genetic improvement of the crop. The fact that the kernel is enclosed in the nut makes it impossible to use it directly in the evaluation and selection of genotypes. It follows that the production, size, and weight of the nut are the most used characteristics for plant selection, both in commercial plantations and in research, particularly in the area of cashew tree genetic improvement (^{Paiva et al., 2005}; ^{Barros et al., 2008}).
Although the existence of correlation between the variables size and weight of the nut shows a dependence relation between them, an evaluation based simply on this correlation may induce to errors in the selection process due to the large variability. ^{Aliyu & Awopetu (2011)} found a coefficient of variation (CV = 25%) in the kernel weight to nut weight ratio, and in the weight of kernel (CV = 40.68%) and nut (CV = 52.02 %) respectively. To evaluate the weight and size characteristics of all fruit of all plants in these experiments, and even of all plants in one plot, is impossible in practice, as cashew trees produce a large amount of fruit during the season. Therefore, it is essential to employ sampling techniques with a suitable sizing, which can act as reliable estimators of these characteristics.
Studies involving sampling are however always subject to some degree of uncertainty, as only part of the population is evaluated. Such uncertainty can be reduced by collecting a greater number of sampling units (or larger samples), and using better measuring instruments. Consequently, specifying the level of precision desired in the results is an extremely important precaution, as it indicates the size of the error permitted, or tolerated, and the probability of this error occurring during the sampling plan, or in the precision of the sampling process (^{Cochran, 1977}).
Several studies, for various purposes, have been carried out based on samples of reproductive agronomic characteristics and of nut and kernel of cashew. However, none of these mentioned the level of precision desired in the results, which weakens the drawn conclusions. ^{Aliyu (2006)} used a sample of 40 cashew nuts to analyse the production components of the cashew tree, and to quantify the phenotypic relationships between nut production and other agronomic characteristics, and found significant positive correlations, from r = 0.844 to r = 0.988, between production and agronomic characteristics.
^{Aliyu & Awopetu (2011)} collected 50 fruit to evaluate the relationships between size and weight of nut, the size and weight of the kernel, and kernel to nut ratio, associating them to market requirements. ^{Chacko (1997)} used a sample of 100 nuts per plant to determine the nut mean weight and to identify trees with high-yield potential, capable of producing medium to large nuts (6 g to 10 g), and kernels over 1.8 g.
^{Almeida et al. (1992)} studied the physical characteristics of nuts and kernels from the progeny of four dwarf cashew clones, CCP 06, CCP 09, CCP 76, and CCP 1001, to evaluate the respective genotypes based on these characteristics. Among other characteristics, they evaluated the weight, length, width, and thickness of the nut, in a sample of 40 nuts harvested at random from each parent plant. ^{Garruti & Cordeiro (1993)} took samples of 25 nuts per clone to evaluate, among other biometric characteristics, the weight, length and diameter of nuts in four clones. ^{Sardinha et al. (1998)}, with the aim of selecting superior genotypes and expanding the cashew germplasm collection in Guinea-Bissau, selected 42 trees, taking a sample of 20 nuts from each tree, to evaluate the following physical characteristics: length, width, greatest thickness, and smallest thickness.
^{Vale et al. (2014)} estimated genetic parameters and potential production performance in seven full-sib progeny originating from dwarf cashew crosses. Among other characteristics, these authors evaluated the production (kg ha^{-1}) and mean nut weight (g), estimated from a sample of 20 nuts per progeny; and nuts in a same sample were weighed together. ^{Cavalcanti et al. (2012)} evaluated the production potential of 84 cashew clones, estimated genetic parameters and identified quantitative trait loci (QTL) associated with disease and various plant characteristics, such as production (kg ha^{-1}) and nut weight (g). A sample of 20 nuts from each clone was harvested to estimate nut weight, which varied from 4.15 g to 12.48 g, with a mean of 7.41 g.
^{Lima et al. (2015)} developed a “simplified protocol” to operationalize the processing of cashew nuts, as an aid to the evaluation and selection of progeny from the cashew genetic improvement project. For it, they established the sample size of 100 nuts. There was no mention of the level of precision in the result regarding this sample size, or of the sampling methodology used, which may weaken its application in further research.
The objective of this work was to estimate sample sizes for cashew nuts, specifying the size of the error permitted or tolerated in the estimates, and the desired levels of precision in the results, as an aid to programmes of genetic improvement.
Materials and Methods
The base population of this research was represented by a mixture of nuts of different sizes, from the 2015/2016 harvest, originating from experiments with open-pollinated hybrid progenies including the dwarf cashew and common cashew.
The experiments were carried out in the experimental area of Embrapa Agroindústria Tropical, in the municipality of Pacajus (4°11'27" S, 38°29'51" W, at 60 m altitude), in the state of Ceará, Brazil, so as to include the greatest possible variation in the characteristics under study. Nuts were harvested on the ground, under the crowns of the various genotypes, without identifying the genotype from which they came.
After harvesting, nuts were spread out in the sun to dry for three days in a cement-based dryer, and turned over several times daily, as recommended by ^{Paiva & Silva Neto (2013)}. Under such conditions, this drying time is sufficient to reduce moisture to 10% for storage, according to ^{Lima (2013)}.
After the drying period, nuts were separated by size in a cylindrical sorter of perforated plates, with circular meshes from 17 mm to 25 mm diameter (Figure 1) which allow of nuts to pass through according to size. The 17 mm meshes allowed of the smallest nuts to pass through, which for the purpose of this work were classified as size 1 (S1); the 19 mm meshes allowed of nuts size 2 (S2 > S1); 23 mm meshes allowed of nuts size 3 (S3 > S2); and 25 mm meshes, allowed nuts size 4 (S4 > S3) to pass through. Size 5 (S5 > S4), represented by large nuts that pass over each mesh without passing through any of them, were send directly to the receiver at the far right of the sorter (Figure 1 C) After sorting, nuts were packed into plastic boxes, with the identified sizes (S1 to S5) selected by the sorter. In addition, one more set representing the original batch of nuts was obtained by random selection from several points in the dryer. In this set, nuts were collected from each point, without being separated by size, that is, a mix of different sizes identified as Sm, and packed into a plastic box as the other sizes.
As a result, the study followed a design of uniform stratified random sampling, which consists of subdividing the population into homogeneous subgroups (strata), in such a way as to have homogeneity within strata, and heterogeneity between them (^{Cochran, 1977}; ^{Scheaffer et al. 2011}; ^{Ryan, 2013}). The sampling plan consisted of six strata; each stratum was represented by a nut size (S1, S2, ..., S5, Sm). Three samples of 200 nuts were then randomly removed from each stratum and they were not replaced; these samples were then identified and separately packed into plastic bags. The remaining nuts were returned to their respective boxes.
Each sample was cleaned by removing any nuts considered unsuitable for industry (shrivelled, punctured or damaged), and also foreign matter such as sand, stones etc. The shrivelled, punctured and damaged nuts found in each sample were replaced by intact nuts of the same size, removed from their respective boxes. Nuts from each sample were individually weighed on a BEL model S2202 electronic balance with 2,000 g maximum capacity and 0.01 g precision.
Composed as above, the strata fit the principles established by ^{Scheaffer et al. (2011)} and ^{Pfeffermann & Rao (2009)}, that is, large strata of a similar size. After weighing the nuts, the normality hypothesis was proved for the weight variable in each stratum. Under such conditions, where the α level = 0.05 probability, the standard normal quantile is approximately 2.0. Therefore, the mean variance in a sampling plan precision is associated with the maximum level for error B permitted in the estimates, or levels of precision desired in the results, that is, σ^{2} = B^{2} / 4. According to ^{Scheaffer et al. (2011)} and ^{Pfeffermann & Rao (2009)}, under these conditions, and assuming that the cost of observation is the same for all strata, the allocation or sample size n_{i}, of the i^{th} stratum (Neyman allocation) is obtained by
in which: L is the number of strata, in this case 6 (the six sizes of nut); N_{i} is the size of the i^{th} stratum (i = 1, 2, …, 6); σ_{i} is the standard deviation of the i^{th} stratum (i = 1, 2, …, 6); N_{k} is the sample size of the k^{th} stratum (k = 1, 2, …, 6); σ_{k} is the standard deviation of the sample of the k^{th} stratum (k = 1, 2, …, 6); and, n is the total sample size.
Under these conditions i = k, therefore:
and the total size of sample n, according to ^{Scheaffer et al. (2011)}, is given by
As i = k, then
in which: N is the number of sampling units in the population: N = N_{1} + N_{2} + ... + N_{L}; is the variance of the i^{th} stratum (i = 1, 2, ..., 6); D is the estimator of the fixed mean variance in the sampling plan precision, associated with the maximum error permitted in the estimates - D = B^{2}/4.
With the variances (s^{2}) and respective standard deviations (s) estimated for each stratum, substituted into equations 1 and 2, the sample sizes n_{i} (i = 1, 2, ..., 6) were estimated for each stratum, and the total size of sample n, for the maximum error level B permitted in the estimates, or the precision desired in the results. As there must be a relationship between the values set for B and the unit of measure of the phenomenon under study, and the values for nut weight are relatively low, they should be carefully chosen. In view of this, the magnitude permitted for B where: 0.1 g, 0.2 g, 0.3 g, 0.4 g, 0.5 g, 0.6 g, 0.7 g, 0.8 g, 0.9 g, and 1.0 g; from these values, and taking the variances and standard deviations, the total sample size and the sample size of each stratum were estimated.
Results and Discussion
Assuming B = 0.2 g as the maximum error level in the estimates, or level of precision desired in the results, the sample size of nut stratum S4, whose variance is s^{2} = 1.9315, is n_{4} = 30 nuts. For stratum S5, whose variance is s^{2} = 3.7166, the sample size is n_{5} = 42 nuts. Note that for the same level of error permitted in the estimates, or precision desired in the results, the sample size varies as a function of the variance of the stratum. The smaller nut strata S2 and S1, as they are the most uniform, showed smaller variances, s^{2} = 0.3641 and s^{2} = 0.3285, respectively (Table 1), and make it possible to obtain samples of smaller sizes, whatever the error level of the estimates, or precision desired in the results (Table 2).
Statistical analyses | Strata | |||||
S5 | S4 | S3 | S2 | S1 | Sm | |
Minimum (g) | 8.40 | 6.00 | 6.00 | 4.00 | 1.80 | 1.00 |
Maximum (g) | 21.80 | 14.60 | 10.00 | 7.20 | 5.20 | 15.80 |
Mean (g) | 12.71 | 9.76 | 7.26 | 5.29 | 3.84 | 6.79 |
Variance (s^{2}) | 3.7166 | 1.9315 | 0.7797 | 0.3641 | 0.3285 | 5.3366 |
Standard deviation (s) | 1.9278 | 1.3898 | 0.8830 | 0.6034 | 0.5731 | 2.3101 |
Coefficient of variation (%) | 15.17 | 14.24 | 12.15 | 11.40 | 14.91 | 34.03 |
B | n | n_{5} | n_{4} | n_{3} | n_{2} | n_{1} |
0.1 | 463 | 166 | 120 | 76 | 52 | 49 |
0.2 | 116 | 42 | 30 | 19 | 13 | 12 |
0.3 | 52 | 19 | 13 | 9 | 6 | 5 |
0.4 | 29 | 10 | 8 | 5 | 3 | 3 |
0.5 | 19 | 7 | 5 | 3 | 2 | 2 |
0.6 | 13 | 5 | 3 | 2 | 2 | 1 |
0.7 | 10 | 3 | 3 | 2 | 1 | 1 |
0.8 | 8 | 3 | 2 | 1 | 1 | 1 |
0.9 | 6 | 2 | 1 | 1 | 1 | 1 |
1.0 | 5 | 1 | 1 | 1 | 1 | 1 |
The largest variance (s^{2} = 5.3366) was recorded and, therefore, the largest sample size was obtained, in the Sm nut stratum. For a value B of 0.2 g, the sample size is n_{m} = 50 nuts. Under these conditions, the total sample size would be n = 165 nuts. Assuming B = 0.1 g, the sample size is n_{m} = 198 nuts, and the total size, n = 657 nuts, as expected, they are the largest sample sizes, whatever the level of B. It should be noted that the lower the error level permitted in the estimates, or the greater the precision desired in the results (B), the larger will be the sample size, whatever the size of the nut.
The stratum Sm, formed by the mixture of nuts of several sizes, should not be used as a parameter in this context due to its great variability, which is far larger than the other strata. The total size of sample n of stratum Sm is 70.45% greater than the total sample size n of the strata S5, S4, ..., S1, irrespective of the error level B permitted in the estimates, or accuracy desired in the results. As it did not represent the reality of the research or of marketing, it was not included in Table 2, which only shows samples of the nut size strata S5, S4, ..., S1.
These results agree with those of ^{Thompson (2012)}, ^{Pfeffermann & Rao (2009)}, and ^{Ryan (2013)}, among others, according to whom three factors influence the sample size, as follows: confidence level (the higher the confidence level, the larger the sample size); maximum error permitted in the estimates (the smaller the permitted error, the larger the sample size); and variability of the phenomenon being investigated (the greater the variability, the larger the sample size). Data shown serve as a basis for choosing the sample size to be used in research that requires sampling of the cashew nut, based on the level of error of the estimate that the researcher will accept as reasonable, and on the precision desired in the results of the research (Table 2).
The use of stratified random sampling in this case was configured as in the literature as regards reducing sampling error and improving estimate precision, as reported by ^{Ryan (2013}) and ^{Scheaffer et al. (2011)}, and ^{Sabino & Villaça (1999)}. ^{Rossetti (2001)}, in a preliminary study to estimate the mean weight of cashew nut using only a simple random sample, obtained B = 0.1 g and the same nut size strata values of n_{5} = 1,487; n_{4} = 773 and n_{3} = 312 nuts. Comparing those results with those obtained in the present study (Table 2), it is possible to confirm the effectiveness of the stratified random sampling.
Despite heavier kernels have a higher-market value, the indirect indicators used in the selection of genotypes that produce them are the size and weight of the nut, while no analysis of the kernel is available. Considering the most marketed class in the world (W 320), with 1.5 g kernel weight (^{Garruti et al., 2015}) and desirable kernel to nut yield of 25%, the nut weight is equal to 6.0 g. However, taking into account the preference of Brazilian producers for larger kernels, a nut with 8.0 g is the adopted reference value for the genetic improvement at Embrapa Agroindústria Tropical, which employs the following classification: small nut (weight < 8 g), medium nut (8 g ≤ weight < 12 g) and large nut (weight ≥ 12 g). Based on this premise, these parameters - which give an idea of the position of each stratum in the context of this classification (Table 3) - were estimated for each nut size (stratum) in the present research. It can be seen that the variance decreases according to nut size (Table 1), indicating a greater uniformity in the smaller nuts.
Nut weight classification | Strata (nut size) | |||||
Nut S5 | Nut S4 | Nut S3 | Nut S2 | Nut S1 | Nut Sm | |
Big - weight ≥ 12 g (%) | 62.83 | 6.17 | 0.00 | 0.00 | 0.00 | 2.83 |
Medium - 8 g ≤weight<12 g (%) | 37.17 | 86.00 | 21.33 | 0.00 | 0.00 | 25.50 |
Small - weight < 8 g (%) | 0.00 | 7.83 | 78.67 | 100.00 | 100.00 | 71.67 |
Total (%) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Examining the position of each stratum for the classification of the area for genetic improvement of cashew tree, it can be seen that the S5 and S4 nut strata are the most represented in this context (Table 3). In the first case (S5), 62.83% of the nuts are large (weight ≥ 12 g), with no small nuts (weight <8 g). Those of medium size (8 g ≤ weight < 12 g) are 37.17%, with 12.71 g mean nut weight (Table 1). In the second case (S4), although there are only 6.17% large nuts (weight ≥ 12 g), 86% are medium-sized nuts (8 g ≤ weight < 12 g), with 9.76 g mean nut weight (Table 1), which is higher than the 8.4 g of CCP 76 found by ^{Ribeiro et al. (2004)}, and in agreement with the producers preference. Therefore, taking as an example B = 0.2 g applied to the mean of the S4 stratum (9.76 g), which contains the minimum nut size for selection (8 g), it can be seen that this represents only 2.05%, which may be considered sufficient for the process.
In the S3 stratum, there were no large nuts, and the greatest proportion - 78.67% - were small nuts (weight <8 g). Although 21.33% of the nuts were of medium size (8 g ≤ weight <12 g), the mean nut weight was 7.26 g (Table 1), slightly below the preferred one by Brazilian producers. Whereas, the S2 and S1 nut strata, in which 100% of the nuts were small (weight <8 g), the mean weight ranged from 5.29 g to 3.84 g (Table 1).
The weight variability within the size classes (Table 1) is a result of the criteria adopted for the practical convenience of carrying out the activities of a genetic improvement project, with a view to meeting the demands of domestic producers. These criteria differ slightly from those established in the marketing standards for cashew nut (^{Brasil, 1975}), in which the classes are: large nut (weight ≥ 11.11 g), medium nut (7.14 g ≤ weight < 11.11 g), small nut (4.55 g ≤ weight < 7.14 g), and tiny nut (weight < 4.55 g). Even so, a coincidence of at least 62.83% (S5) was observed.
Some variability can still be introduced by the position at which the nut passes through the sorter mesh, as well as by its morphometry, in relation to length and width (Figure 2). In the first case, nuts of sizes S3 or S2 for instance, may occasionally fall into the S5 or S4 sized meshes, the same occurring in the other cases. In the second case, nuts of different widths can pass through different meshes, and have their weights grouped in the same class due to length (Figure 2). These situations may contribute to the increase of variance and, consequently, of sample size.
Conclusions
Uniform stratified random sampling is an effective methodology for estimating cashew (Anacardium occidentale) nut size samples.
The evaluation of nut size (based on weight) should precede the definition of sample size to be collected for the analysis of morphological characteristics.
The sample size of cashew nut, for the purpose of morphometric evaluation, depends on the variance of the stratum (nut size), as well as on the error level permitted, or tolerated, in the estimates, and on the desired precision in the results.
The stratum formed by a mixture of nuts of various sizes should not be used as a parameter in this context.
The size of the sample to be taken should be established based on the error permitted as acceptable and on the degree of precision desired in the result by the researcher.