Sampling of cashew nuts from cashew tree clones

1Math. DSc in Engineering and Knowledge Management, Embrapa Agroindústria Tropical, Fortaleza, Ceará, Brazil. E-mail: adroaldo.rossetti@ embrapa.br . 2Agr. Eng. DSc in Genetics and Plant Breeding, Embrapa Agroindústria Tropical, Fortaleza, Ceará, Brazil. E-mail: vidal.neto@embrapa.br (ORCID . 3Agr. Eng. DSc in Genetics and Plant Breeding, Embrapa Agroindústria Tropical, Fortaleza, Ceará, Brazil. E-mail: levi.barros@embrapa.br (ORCID . Abstract-The aim of this work was to estimate sample sizes to assist the genetic improvement of the cashew tree (Anacardium occidentale L.). Stratified sampling, comprising five strata (S5, S4, S3, S2, and S1) of five cashew clones (BRS 274, BRS 275, BRS 226, BRS 189 and CCP 76), was effective for estimating the different sample sizes of the nut. Sample size for each clone depends on the weight-nut variance, the margin of error B permitted in the estimates and the desired precision of the results. The increases in sample size with clone variance, lowered the permitted margin of error B, and increased the desired precision of the results. These clones required different sample sizes for a morphological study of the nuts. Larger nuts require larger samples for the same margin of error B. For an error B of 0.2g, the sample size for clones S5, S4 and S3 were n5 = 84, n4 = 49 and n3 = 37 nuts. For clones BRS 274 (S5) and BRS 275 (S4), with better nut classification, the mean weights were respectively 16.79 and 12.78g. Clones BRS 189 (S2) and CCP 76 (S1), with smaller nuts, have a smaller variances, s2 2 = 0.7638 and s1 2 = 1.0712, where the mean weight was 8.29 and 7.81g respectively.


Introduction
The cashew kernel is the principal and most important commercial product of the cashew tree, and is therefore essential to the selection process in projects for the genetic improvement of the crop. The fact that the kernel is enclosed in the nut makes it impossible to use directly in the evaluation and selection of genotypes. It follows that the production, size and weight of the nut are the most used characteristics in plant selection, both in commercial plantations and in research, particularly in the area of genetic improvement of the cashew tree (BARROS et al., 2008;PAIVA et al., 2005).
However, Aliyu and Awopetu (2011) argue that the high and significant correlations (r = 0.76 to r = 0.95) between the weight of the cashew kernel and the weight of the cashew nut indicate that nut size may be a reliable selection indicator for the size of the kernel. Although the existence of a correlation between the variables nut size and weight shows a dependent relationship between the two, an evaluation based simply on this correlation may induce errors into the selection process due to the large variability.
Aliyu and Awopetu (2011) found a coefficient of variation (CV = 25%) in the kernel-weight to nut-weight ratio, and in the weight of the kernel (CV = 41%) and the nut (CV = 52%) respectively. In fact, there are no visual indicators that might allow genotypes with kernels of greater weight and size to be identified from observations of the nut, and that might indicate the quality of the kernel. The establishment of such parameters requires the evaluation of several physical characteristics of both the nut and the kernel in the search for strong and unequivocal correspondences between them that allow a safe inference to be made of one from the other.
Evaluating the weight and size characteristics of all the fruit of all the plants in these experiments, and even of all the plants in any one plot, is impossible in practice, since cashew trees produce a large amount of fruit during the season. It is therefore essential to employ sampling techniques and appropriate sample sizes as reliable estimators of these characteristics.
However, studies involving sampling are always subject to some degree of uncertainty, as only part of the population is evaluated. This uncertainty can be reduced by collecting a larger number of sampling units (or larger samples) and using better measuring instruments. Consequently, specifying the level of precision desired in the results is an extremely important precaution, as it indicates the size of the error to be allowed or tolerated, and the probability of this error occurring during the sampling plan or in the precision of the sampling process (COCHRAN, 1977).
Several studies, for various purposes, have been carried out based on samples of reproductive and agronomic characteristics, and of the cashew nut and kernel. However, none of these mentioned the level of precision desired in the results, which weakens the conclusions drawn. Aliyu (2006) used a sample of 40 cashew nuts to analyse the production components of the cashew tree and to quantify the phenotypic relationships between nut production and other agronomic characteristics. He found significant positive correlations between production and agronomic characteristics that ranged from r = 0.844 to r = 0.988. Aliyu and Awopetu (2011) collected 50 fruit to evaluate the relationships between the size and weight of the nut, the size and weight of the kernel, and the kernel to nut ratio, associating these with market demand. Chacko (1997) used a sample of 100 nuts per plant to determine the mean weight of the nut and to identify trees with high yield potential, capable of producing medium to large nuts (6 g to 10 g) and kernels greater than 1.8 g. Almeida et al. (1992) studied the physical characteristics of nuts and kernels from the progeny of four dwarf-cashew clones, CCP 06, CCP 09, CCP 76, and CCP 1001, to evaluate their respective genotypes based on these characteristics. Among other characteristics, they evaluated the weight, length, width and thickness of the nut, in a sample of 40 nuts harvested at random from each parent plant. Garruti and Cordeiro (1993) took samples of 25 nuts per clone to evaluate, among other biometric characteristics, the weight, length and diameter of the nut, in four clones. Sardinha et al. (1998), with the aim of selecting superior genotypes and expanding the cashew germplasm collection in Guinea-Bissau, selected 42 trees, taking a sample of 20 nuts from each tree to evaluate the following physical characteristics: length, width, greatest thickness and smallest thickness. Vale et al. (2014) estimated genetic parameters and potential production performance in seven full-sib progeny originating from crosses between dwarf cashew tree genotypes. Among other characteristics, he evaluated production (kg ha -1 ) and mean nut weight (g), estimated from a sample of 20 nuts per progeny, where all the nuts from the same sample were weighed together. Cavalcanti et al. (2012) evaluated the production potential of 84 cashew clones, estimated genetic parameters and identified Quantitative Trait Loci (QTLs) associated with disease and various plant characteristics, such as production (kg ha -1 ) and nut weight (g). A sample of 20 nuts from each clone was harvested to estimate the weight of the nut, which varied from 4.15 g to 12.48 g, with a mean of 7.41 g. Lima et al. (2015) developed a 'simplified protocol' to operationalise the processing of cashew nuts as an aid to the evaluation and selection of progeny from the cashew tree genetic improvement project. For this, they established the sample size at 100 nuts. There was no mention of the level of precision in the result for this sample size, or of the sampling methodology used, which might weaken its application in further research. As such, a large variation in sample size was found, without considering the weight/size of the nut or its respective variance, or other methodological aspects crucial to a reliable definition of the number of nuts in the sample.
The work of Rossetti et al. (2019) was the only one found in which these aspects were seen. In order to gather as much variability as possible in the characteristics under study, the authors estimated the size of cashew nut samples from experiments with open-pollinated hybrid progeny, including dwarf and common cashew tree. Sample size was estimated from the size of the cashew nut, and based on the weight of the nut, the maximum margin of error permitted for the estimates, or the desired precision in the results.
The aim of this research was to estimate sample size in cashew nuts, specifying the margin of error permitted or tolerated for the estimates and the desired levels of precision in the results, as an aid to the cashew tree genetic improvement programme.

Materials and Methods
The base population of this research was represented by five genotype (BRS 274, BRS 275, BRS 226, BRS 189 and CCP 76), originating from experiments with clones of the dwarf cashew tree, conducted in the Experimental Area of Embrapa Agroindústria Tropical, in Pacajus in the state of Ceará, Brazil (4°11'26.62'' S; 38°29'50.78'' W; Altitude 60 m), from the 2016/2017 harvest. The nuts were harvested on the ground, under the crowns of the plants of the genotypes. After harvesting, the nuts were spread out in the sun to dry for three days in a cement-based dryer and turned over several times daily, as recommended by Paiva and Silva Neto (2013). Under such conditions, this drying time is sufficient to reduce the moisture to 10%, the recommended level for storage according to Lima (2013).
After the drying period, the nuts were packed in plastic boxes (Figure 1), with identification of the clone. As a result, the study followed a design of uniform stratified random sampling, which consists of subdividing the population into homogeneous subgroups (strata), in such a way as to have homogeneity within strata and heterogeneity between them (RYAN, 2013;SCHEAFFER et al., 2011;COCHRAN, 1977). The sampling plan consisted of five strata (S5, S4, ..., S1), each stratum being represented by a clone (BRS 274,BRS 275,BRS 226,BRS 189 and CCP 76). Three samples of 375 nuts were then randomly removed from each stratum and not replaced; these were identified and separately packed into plastic bags. The remaining nuts were returned to their respective boxes. Rev. Bras. Frutic., Jaboticabal, 2020, v. 42, n. 1: (e-563) Each sample was cleaned by removing any nuts considered unsuitable for industry (shrivelled, punctured or damaged), and also foreign matter such as sand, stones etc. Shrivelled, punctured and damaged nuts found in each sample were replaced by intact nuts from the same clone, removed from their respective boxes. The nuts from each sample were individually weighed on a BEL model S2202 electronic balance with a maximum capacity of 2,000 g and a precision of 0.01 g.
Composed as above, the strata fit the principles established by Scheaffer et al. (2011) and Pfeffermann and Rao (2009), i.e. large strata of a similar size. After weighing the nuts, the normality hypothesis was proved for the weight variable in each stratum. Under such conditions, where the α level = 0.05 probability, the standard normal quantile is approximately 2.0. Therefore, the mean variance in sampling-plan precision is associated with the maximum level for error B permitted for the estimates, or levels of precision desired in the results, i.e. σ 2 =B 2 /4. According to Scheaffer et al. (2011) and Pfeffermann and Rao (2009), under these conditions, and assuming that the cost of observation is the same for all strata, the allocation or sample size n i , of the i-th stratum (Neyman allocation) is obtained by: where: where: N: is the number the sampling units in the population: N = N 1 + N 2 + ... + N L ; σ i 2 = is the variance of the i-th stratum (i = 1, 2, ..., 5); D: is the estimator of the fixed mean variance in the sampling-plan precision, associated with the maximum error permitted in the estimates: D = B 2 /4.
With the variances (s 2 ) and respective standard deviations (s) estimated for each stratum and substituted into equations 1 and 2, the sample sizes n i (i = 1, 2, ..., 5) were estimated for each stratum together with the total size of sample n, for the maximum margin of error B permitted for the estimates or the precision desired in the results. As there must be a relationship between the values set for B and the unit of measure of the phenomenon under study, and the values for nut weight are relatively low, these should be carefully chosen. In view of this, the magnitude permitted for B = (0.1 g, 0.2 g, 0.3 g, 0.4 g, 0.5 g, 0.6 g, 0.7 g, 0.8 g, 0.9 g and 1.0 g); from these values, and taking the variances and standard deviations from Table 1, the total sample size and the sample size of each stratum were estimated and are shown in Table 2.

Results and Discussion
For example, assuming B = 0.2 g as the maximum margin of error for the estimates or level of precision desired in the results, the sample size of nut stratum S4, whose variance s 2 = 2.5939, is n 4 = 49 nuts. For stratum S3, whose variance s 2 = 1.5382, the sample size is n 3 = 37 nuts. Note that, for the same margin of error permitted for the estimates or level of precision desired in the results, the sample size varies as a function of the variance of the stratum.
The smaller nut strata (S2 and S1), as they are the most uniform, have smaller variances: s 2 = 0.7638 and s 2 = 1.0712 respectively (Table 1), and make it possible to obtain samples of smaller size (Table 2), whatever the error level of the estimates or precision desired in the results. Clone BRS 274 (S5), which produces the largest nuts, has the largest variance: s 2 = 7.6920 and, consequently, the largest sample sizes, regardless of the margin of error B permitted for the estimates or of the desired precision in the results (Tables 2). It should be noted that the lower the margin of error permitted for the estimates or the greater the precision desired in the results (B), the larger will be the sample size, whatever the clone (stratum) ( Table 2).
These results agree with those of Thompson (2012), Pfeffermann and Rao (2009), and Ryan (2013), among others, according to whom three factors influence sample size: (a) confidence level (the higher the confidence level, the larger the sample size); (b) maximum error permitted for the estimates (the smaller the permitted error, the larger L: is the number of strata, in this case 5 (nuts of the five clone); N i : is the size of the i-th stratum (i = 1, 2, …, 5); σ i : is the standard deviation of the i-th stratum (i = 1, 2, …, 5); N k : is the sample size of the k-th stratum (k = 1, 2, …, 5); σ k : is the standard deviation of the sample of the k-th stratum (k = 1, 2, …, 5); n: is the total sample size. Under these conditions i = k, therefore: And the total size of sample n, according to Scheaffer et al. (2011), is given by: the sample size); and (c) variability of the phenomenon being investigated (the greater the variability, the larger the sample size). The data shown in Table 2 serve as a basis for choosing the sample size to be used in research that requires sampling of the cashew nut clone, based on the level of error for the estimate that the researcher will accept as reasonable, and on the precision desired in the results of the research.
The use of stratified random sampling in this case was configured as in the literature, to reduce sampling error and improve estimate precision, as reported by Ryan (2013) and Scheaffer et al., (2011) and Sabino and Villaça (1999). Rossetti (2001) studied the accuracy of field experiments with fruit trees and other perennial tree plants as a function of the size of the area. The best results were obtained by associating stratified sampling with the intraclass correlation coefficient and the basic principles of the experimentation. Comparing those results with the results obtained in the present study (Table 2), it is possible to confirm the effectiveness of stratified random sampling.
Despite heavier kernels having a higher market value, the indirect indicator used in the selection of genotypes that produce such kernels while no analysis of the kernel is available, is the size and weight of the nut. Considering the most-marketed class in the world (W 320), with a kernel weight of 1.5 g (GARRUTI et al., 2015) and desirable kernel to nut yield of 25%, the weight of the nut is equal to 6.0 g. However, taking into account the preference of the Brazilian producer for larger kernels, a nut of 8.0 g is the adopted reference value in the area of genetic improvement of Embrapa Agroindústria Tropical, which employs the following classification: small nut (weight < 8 g), medium nut (8 g < weight < 12 g) and large nut (weight > 12 g). Based on this premise, these parameters were estimated for each clone (stratum) in this research, which give an idea of their position in the context of this classification (Table 3). It can be seen that the variance decreases with nut size (Table 1), indicating greater uniformity in the smaller nuts.
Examining the position of each clone (stratum) with reference to the area classification for genetic improvement of the cashew tree, it can be seen that the nuts of clones (strata) (S5, S4 and S3) are the most represented in this context (Table 3). In the first case (S5), 93% of the nuts are large (weight > 12 g), with no small nuts (weight < 8 g). Those of medium size (8 g < weight < 12 g) total only 7%, with a mean nut weight of 16.79 g (Table 1). In the second case (S4), only 0.7% are small nuts (weight < 8 g). Large nuts (weight > 12 g) and medium-sized nuts (8 g < weight < 12 g) are respectively 72% and 27%, with a mean weight of 12.78 g (Table 1). In the third case (S3), the medium-sized nuts (8 g < weight < 12 g) total 91%, with only 4% large nuts (weight > 12 g) and 5% small (weight < 8 g), and whose mean weight of 10.08 g is higher than the 8.4 g of clone CCP 76, found by Ribeiro et al. (2004), and therefore in agreement with producer preference. Therefore, taking as an example a value of B = 0.2 g applied to the mean for clone BRS 189 (S2) (8.29 g), which contains the minimum nut size for selection (8 g), it can be seen that this represents only 2%, which can be considered sufficient for the process. Clone CCP 76 (S1), although considered a reference, had the lowest expression in this research, with the highest proportion (52%) of small nuts (weight < 8 g), and no large cashew nuts (weight > 12 g) (Table 3). Although 48% of the nuts were of medium size (8 g < weight < 12 g), the mean weight was 7.81 g (Table 1), which, although slightly below the reference value (8, 0 g), is still close to that preferred by Brazilian producers.
The variability in cashew nut weight for each clone/stratum (Table 1) is a result of the criteria adopted for practical convenience in carrying out the activities in breeding projects to meet the demands of Brazilian producers. These criteria differ slightly from those established in the marketing standards for cashew nuts (BRASIL, 1975), where the classes are: large nut (weight > 11.11 g), medium nut (7.14 g < weight < 11.11 g), small nut (4.55 g < weight < 7.14 g) and tiny nut (weight < 4.55 g). Even so, a coincidence of at least 93% was seen in clone BRS 274 (S5) (Table 3).
However, from the point of view of normativecommercial classification, all the clones under study showed good performance (Table 4). The medium-size cashew nuts (7.14 g < weight < 11.11 g) of clone BRS 186 (S2), which contained the minimum size for selection (8 g), totalled 91%, therefore 23% greater than the 68% found with the classification of the area of Genetic Improvement of Embrapa Agroindústria Tropical. The cashew nuts of clone CCP 76 (S1), whose average nut weight of 7.81 g is less than 8 g, totalled 75%, by this classification, 27% more than the 48% of the Embrapa classification.
Even in the case of clones, genetically improved material, considerable variability was seen within each clone (Table 1), where the largest variances (s 2 = 7.6920) and (s 2 = 2.5939) were found in the BRS 274 (S5) and BRS 275 (S4) genotypes, which produce larger nuts ( Figure  2). This is probably due to the environmental component, since the production period lasts about four months and is subject to variation in climatic. Similar results were found by Rossetti et al. (2019), who worked with cashew nuts of different genotypes, separated by the size of the nut only, with no identification of the original genotype, confirming greater variability in large cashew nuts.
This shows the influence of the size of the cashew nut, in addition to the effect of the genotype, in determining the size of the sample. This characteristic reflects on the size of the sample, which is a function of the variance of the phenomenon under study. For an error B of 0.2 g, the sample sizes of the BRS 274 (S5), BRS 275 (S4) and BRS 226 (S3) clones were respectively n 5 = 84 nuts, n 4 = 49 nuts and n 3 = 37 cashew nuts. Table 3. Percentage of cashew nuts according to the classification of the nuts in the five clones (strata) based to the criteria adopted for genetic improvement of cashew tree.

Conclusions
Uniform stratified random sampling was effective as a methodology in this research.
The size of the cashew nut sample, for the purpose of morphometric evaluation, depends on the variance of the clone (stratum), the margin of error permitted or tolerated for the estimates, and the precision desired in the results.
The size of the sample to be taken should be established based on the error permitted as acceptable and the degree of precision desired in the result by the researcher.
The size of the sample of cashew nuts originating from cashew tree clones varies with the clone, and depends on the size of the nut.