SPATIAL PATTERN DETECTION MODELING OF THRIPS ( Thrips tabaci ) ON ONION FIELDS

Onion (Allium cepa) is one of the most cultivated and consumed vegetables in Brazil and its importance is due to the large laborforce involved. One of the main pests that affect this crop is the Onion Thrips (Thrips tabaci), but the spatial distribution of this insect, although important, has not been considered in crop management recommendations, experimental planning or sampling procedures. Our purpose here is to consider statistical tools to detect and model spatial patterns of the occurrence of the onion thrips. In order to characterize the spatial distribution pattern of the Onion Thrips a survey was carried out to record the number of insects in each development phase on onion plant leaves, on different dates and sample locations, in four rural properties with neighboring farms under different infestation levels and planting methods. The Mantel randomization test proved to be a useful tool to test for spatial correlation which, when detected, was described by a mixed spatial Poisson model with a geostatistical random component and parameters allowing for a characterization of the spatial pattern, as well as the production of prediction maps of susceptibility to levels of infestation throughout the area.


INTRODUCTION
Onion (Allium cepa) is one of the most cultivated and consumed vegetables in Brazil.The social importance of the crop is due to the large laborforce involved.It is estimated that 70% of the production is obtained under small scale, because it is typically grown in small and medium sized properties.It is an annual plant for bulb production, biannual for seed production, and propagated by direct sowing, bulbs or seedlings planted in beds and transplanted to the field.
One of the main pests that affects onion crops is the Onion Thrips (Thrips tabaci), which in high infestation levels can damage the crop (Workman & Sci.Agric.(Piracicaba, Braz.), v.66, n.1, p.90-99, January/February 2009 Martin, 2002), with reductions in the production up to 80% during hot and dry periods (Sato, 1989).The insect is typically found at the base of the leaves.It feeds from the sap and the leaf parenchyma causing gray spots which gradually change to silver as a result of the external tissue damage of the leaves.Massive attacks on the aerial part of the plant cause loss in bulb production, which reduces their size and quality, damaging the commercial value and creating obstacles to exports.When an attack is very intense, the leaves become yellowish, dry and with wrenched tips, causing wilting and death of the plant (Sato, 1989), and also allowing for the entrance of water to the bulb, which turns rotten.The insect is also considered a vector of a phytopathological agent with the capacity to transmit virus to the plant.
The insect development occurs in the four phases of egg, nymph, pupa and adult, with the nymph and adult stages damaging the production, because the pupal phase is restricted to the soil.The nymph has low mobility, whereas the adult, although winged, has restricted movement.The development cycle varies typically from 14 to 30 days, changing to 10 and 11 days when the temperature is over 30°C.
The spatial distribution of thrips in commercial fields is important for the efficient application of insecticides.However, this has not been considered in crop management recommendations, experiment planning and sampling plans.Considering the low mobility of nymphs and adults it is reasonable to assume that the wind is the main dispersion factor for the thrips that potentially determines the spatial pattern.
A spatial pattern can be classified as random, aggregate or uniform.The random pattern occurs when there is a constant and independent probability of infestation for all the plants, while the aggregate pattern is associated with low insect mobility.The uniform pattern rarely occurs naturally but can be induced, for instance, by alternated planting of resistant and susceptible plants.In order to study whether infant leukemia cases tend to be close in space and time, Mantel (1967) proposed a randomization test, based on matrices of time and space distances between observations.This test can be used to look for spatial correlation in an insect distribution, but its usage has not been considered in practical applications, and in particular, in studies of the spatial distribution of the Onion Thrips.
It is common in insect distribution studies, to find the use of indices based on the relationship between the variance and the mean, such as David & More index, the Taylor power law, and the aggregate indices of Lloyd and Iwao, among others (Ruiz-Cárdenas et al., 2003).However, these indices ignore the spatial location of the samples, have limited capacity to describe spatial patterns, and strongly depend on the size of the sample unit.
Geostatistical methods (Isaaks & Srisvastava, 1989;Goovaerts, 1997) have been used to describe spatial patterns of insects as, for instance, in Grego et. al. (2006).Such methods were originally developed for continuous response variables, with several computational implementations available for data analysis.The insect counts are discrete and typically distributed in clusters, with many zero counts.Therefore, the data cannot have a covariance structure of the type assumed by traditional methods of geostatistical analysis, with a stationary spatial covariance structure in the study area (Ruiz-Cárdenas, 2002).For this reason it is appropriate to use models that incorporate explicitly a data generating mechanism such as the Poisson distribution, combined with structures that describe the spatial pattern of the counts.These kinds of models have been proposed in the statistical literature (Diggle et al., 1998) but have had few practical applications.
This paper describes a study of the spatial distribution of the Onion Thrips with data from surveys of four different properties with different infestation levels and planting methods.We aimed to detect spatial patterns in the occurrence of the Onion Thrips at different production fields and propose an statistical model for such patterns.We adopt the Mantel randomization test (Manly, 2006) to decide for the presence of spatial autocorrelation which when detected was modeled by a mixed spatial Poisson model with a random term given by a geostatistical component.This model allows the characterization of the spatial pattern as well as the production of maps of levels of susceptibility to infestation in different areas.

Data description
This work is motivated by a set of data originated from a study involving sampling onion thrips in onion crops of four different farms, located in the municipality of São José do Rio Pardo, São Paulo State, Brazil (21°36'S, 43°53'W; altitude 705 m), from June to September, 1996.The aim is to study the spatial and temporal distribution of thrips.The four chosen properties used the onion hybrid Granex 33 and the seedling planting method.The trial areas were chosen with neighbors who adopted different kinds of planting techniques and had different infestation levels.
Details referring to the kind of planting in the neighborhood and collection dates and numbers of samples collected in the different farms are shown in Table 1.The São Paulo farm is located at a high el-Sci.Agric.(Piracicaba, Braz.), v.66, n.1, p.90-99, January/February 2009 evation of the region and the nearest neighboring onion crop is situated over one kilometer away.The neighborhood of Estância Bela Vista had already had some crops attacked by the onion thrips pest.
The sampling unit was a 1 m radius circle with a center stake.One plant was then randomly selected from within the circle.The position of the stakes in the four farms, in general in a 10 × 10 m grid, with some variations at Fazenda São Paulo, is shown in Figure 1.The measured variables were the stake location on the coordinate axes, the number of nymphs, the number of adult insects and the number of leaves per plant.The number of samples and sampling times varied from farm to farm as shown in Table 1.The response variables are discrete as a result of counting.In some cases, the counts are multiples of 5 or 10 and some values over 100 were truncated to 100.
Figure 2 shows box-plots for the average number of insects per leaf, for the four farms as a function of sample time.There is a great variability among the counts and some outliers are also present, not all of them being influential to model fitting.At the São Paulo farm the average number of insects and the variability increased with time, whereas at the other farms, the average increased and then decreased.In all cases the observations above the median are more variable showing a positive asymmetry with some extreme values.
At the São Paulo farm the lowest average number of insects per leaf and also the lowest variance were found at the date 07/31, with one insect per leaf as the maximum value.In contrast on 09/04 this farm had a much larger average number of insects per leaf and a much greater variability.The percentage of in- Table 1 -Characteristics about the data precedence, types of neighbors, sample times and number of samples.
Sci. Agric.(Piracicaba, Braz.), v.66, n.1, p.90-99, January/February 2009 fested plants ranged from 35% to 100%.For the Estância Bela Vista, the lowest average number of insects per leaf occurred on 07/11 and 08/14 with 89% to 100% plants infested.The Rosário farm had only 50 plants sampled and the highest average number of insects per leaf on 08/11 and 08/18.Sítio Novo II had the least average for the number of insects per leaf with low variability except for one outlier count of 30.

Mantel's test for the detection of the spatial pattern
The non existence of a spatial pattern in the dispersion of insects may be considered a randomization hypothesis and the existence of a spatial pattern can be tested through the randomization of the order of the observed values (Manly, 2006).Randomization tests are based on the fact that, if the null hypothesis is true, then all of the possible orders of the data have the same chance of occurrence.Therefore, the value e o of a statistic E is calculated for a set of observations, and then a large number of randomizations are made.For spatial data these randomizations are made by randomly reordering the data.For each randomization a value e a is calculated and the set of the e a values generate an approximation of the randomization distribution of E. Just as for classic statistical tests, the decision is guided by a p-value, which in the case of randomized tests is given by the proportion of the e a values that are larger than or equal to e o , for a onesided test.For instance, if p < 0.05, it is concluded that there is evidence that the null hypothesis is not true (Manly, 2006).
Randomized tests have some advantage in comparison to classic statistical tests.For example, the statistics are usually easy to calculate relatively to the classic statistical tests.They are based on non standard statistics and they do not need previous information about the population from which the samples were taken.Also, they can be applied with non-random samples which can consist only of the data that need to be analyzed (Manly, 2006).However, the randomization tests are easier to be justified when the analyzed samples are random or the experimental design suggests a randomization test.
Usually, when considering spatial data, it is desired to test the null hypothesis of a random spatial pattern versus the alternative of a non-random spatial pattern.A test for this hypothesis was proposed by Mantel (1967).The test is implemented as follows.Let a variable be observed in n locations.Two symmetric matrices A and B are obtained, each with n × n dimensions.The elements represent distances between the observations.These matrices can be denoted as Elements of the matrix A are the Euclidian distances between the stakes with locations given by (x 1i , x 2i ) and (x lj , x 2j ) i.e., with elements of the form and B is the matrix with elements 2 ) ( , where Z is the mean of the number of insects per leaf.The statistic test is given by the Pearson correlation coefficient between the correspondent elements of A and B, i.e., , which produces the r 0 value when calculated for the observed values.For the randomization test the rows and columns of one of the matrices are permutated a large number (N) of times, and the values r ak are obtained, for k = 1, 2,…, N. The proportion p of values r ak > r 0 is then compared with a pre-fixed significance level α (for example, 0.05) and the null hypothesis is rejected if p < α (Manly, 2006).
As the matrices A and B are symmetric, the correlation amongst all the elements outside the main diagonal is the same as the correlation of the m = n(n+1)/2 elements in the upper or lower triangular part of the matrix.Note that the only term of (1) that is altered by changing the order of the elements in one of the two matrices is the sum of products Z = Σ a ij b ij .
Other possible metrics used for the calculation of the distances are Euclidian with standardized data, Euclidian squared, Euclidian squared with standardized data, proportional distance and sample difference.The alternative is given by Snäll et al. (2003) who built a randomized test using flexible forms for the relation between the distance measurements, given by the structure of additive generalized models.
When the Mantel test rejects the null hypothesis there may be interest in knowing the kind of association amongst the variables.This can be shown by the graph of ij b versus .ij a One of the possible models of association is the simple linear regression, in which the elements of the A matrix give an explanatory variable and the elements of the B matrix a response variable, so that, b ij = β 0 + β 1 a ij + ε ij ,where β 0 e β 1 are parameters to be estimated and ε ij is the error associated with the response assumed to be Gaussian, independently and identically distributed.This assumption is a pragmatic approach avoiding more complex structures for the error term which would require further modeling assumptions we wish to avoid at this exploratory stage.Also, more complex forms of spatial dependence than given by the linear relation can, in principle, also occur.Our approach is to rely on simple assumptions for the randomization tests and leaving more complex structures to be considered by the model discussed in the next Section.
In this study, the randomization test for spatial pattern was carried out on the observations for each sampling date.The test can be extended for the detection of time patterns.However, this raises the question of how to combine the information from several units of observation.Although such alternative has been studied for the data on thrips occurrences, it was decided not to include here the results because of the small number of observations in time and the lack of a specific interest in testing for time patterns.

Modeling the spatial pattern
Having detected a spatial pattern, it may be of interest to describe the pattern by means of a stochastic model.Modeling allows not only the characterization of the dependence pattern but also for the prediction of quantities of interest such as a map of expected levels of infestation over the area, the proportion of the area with infestation above or below a certain threshold, and areas with high and low infestation, among others possible quantities of interest.
One possible way of modeling the spatial distribution is by adopting the geostatistical framework, which associates the level of spatial dependency with distances between sampled plots.Usually the description of the spatial dependence assumes that the closest sampled plots are more alike than those farthest apart (Montagna, 2001).Diggle et al. (2003) uses the term geostatistics to identify a part of the spatial statistical methods in which the used model describes a continuous variation of the observations over space.
The basic geostatistical data format is (x i , y i ), i=1, 2, ..., n, in which x i = (x 1i , x 2i ) identifies the spatial location, generally in the two dimensions and i y is the measure of interest at the x i position of the ith observation.The response variable can be potentially measured at any point within the studied region (Diggle & Ribeiro Jr, 2007).
The geostatistical model is specified by assuming two processes over the study region (Diggle et al., 1998;Diggle & Ribeiro Jr, 2007) described as follows.{Y(x)}: x ∈ A is a measurement process within the Sci.Agric.(Piracicaba, Braz.), v.66, n.1, p.90-99, January/February 2009 study region A which is observed at a set of locations x to obtain the y i 's of the observed data.This first process is related to an underlying Gaussian process S = {S(x): x ∈ R 2 } with mean µ, variance σ 2 and correlation function ϕ(µ), where µ is the distance between pairs of observations.The values of S(x) are usually not directly observed.Conditional independence is assumed in the sense that the Y(x) are independent, conditionally on the values of S(x) meaning all the spatial dependency is modeled through S(x).The exact form of the relation between the two processes may vary according to the type of variable being measured.For instance, when Y follows the Gaussian distribution, the model can be written as Y i = S(x i ) + Z i , in which the Z i values are mutually independent and follow the normal distribution, with mean 0 and variance τ 2 .In this case the observations y i can be seen as a noisy version of S(x i ) at the location x i , and, for a finite set of plots, the random vector Y follows a multivariate Gaussian distribution.More generally, Y may follow other distributions and Diggle et al. (1998) specify a model within the class of the generalized linear model (McCullagh & Nelder, 1989) in which the S process defines random effects with spatial dependence structure.Diggle & Ribeiro Jr (2007) call this a generalized linear geostatistical models (GLGM).This model allows the explicit specification of a Poisson distribution for the observations, which is compatible with the insect counting structure of the data considered here.
The GLGM is a special case of a mixed generalized linear model, in which the Y i , i=1, 2, …, n are conditionally independent given S(x), with expected values given by E[Y i S(x)] = λ i and linear predictor h(λ i ) = S(x i ), i=1, 2, …, n with a known link function h(.), which, for the Poison model here considered is typically given by a logarithm function.The model can be extended allowing for covariates considering S(x i ) = S(x i ) + d(x i ) T β , in which d(x i ) is the observed covariate value and β is the regression parameter vector (Diggle et al., 1998;Diggle et al., 2003).
Let Y(x i )S(x i ) be the observed total number of insects with a Poisson distribution with mean t i exp[S(x i )], i=1, 2, …, n in which t i represents the number of leaves.Then the probability function is given by The likelihood function is often considered for inference about the model parameters within the context of generalised linear models.However, in this case the likelihood function does not have a closed form and is given by where: R is the correlation matrix for S and with dimension equal to the number of observations which cannot be solved by analytical or numerical methods.Each element of R is given by the corresponding value of the correlation function of the S process and therefore having model parameters within non-linear functions which explains the lack of such solutions.A possible solution is to use Monte Carlo Markov Chain (MCMC) methods and a computational implementation is available through the package geoRglm (Christensen & Ribeiro Jr, 2002) for the R statistical environment (R Development Core Team, 2007).
For discrete random variables, the variogram is not a natural summary of the data, but it may be useful as a diagnosis tool, after fitting the mixed generalized linear model (Diggle & Ribeiro Jr, 2007).In this case, the variogram obtained from the estimated parameters can be compared to the experimental variogram, obtained through the residuals from a GLM model fit.The variogram is given by which can be written as However, this approach must be used with caution because the variogram is even more erratic then the one usually obtained for data with a symmetric and continuous distribution, because of the asymmetric data.
After the choice of a specific model, a map that describes the behaviour of the study variable over the region can be obtained.Supposing that the parameters are known and that the interest is in the expected insects number given by λ( x 0 ) = exp[β + S(x 0 )], for the location x 0 = (x 10 , x 20 ), from the S marginal distribution and the YS conditional distribution, it is possible to simulate the conditional distribution of [Sy], using the MCMC method.The predicted surface is given (Diggle et al., 1998) where β ˆ is the process mean in this case because there are no explanatory variables or trend, and ( ) is the linear kriging predictor and Var(x) is the prediction variance.

Spatial pattern detection through Mantel's randomization test
The Mantel test was applied separately for each sampling date and each farm, and the obtained p-values contrasted with the adopted 5% significance level.For the Fazenda São Paulo, there was evidence of a spatial pattern in the number of insects per leaf for the first three data collections on 10 th , 24 th and 31 th of July.These patterns can be observed in Figure 3 where symbol sizes are proportional to the number of insects per leaf.In general, considering all the farms and dates, the distribution of the mean number of insects per leaf is asymmetric and does not show trends in relation to the spatial coordinates.The linear regression between the number of insects by leaf distances and the stake location distances show that, for the above mentioned dates, there is evidence of a positive association in conformity with Table 2 which shows, as well, analogous results for the dates that detected spatial pattern for the other farms.
For Estância Bela Vista, the spatial pattern was detected only for the third data collection on 8th of August.The dispersion plot for this date are shown in Figure 4.For Sítio Rosário, evidence of spatial patterns was not found for any of the dates.Analysis for Sítio Novo II, suggests presence of spatial pattern for the 2 nd , 4 th and 6 th data collections made on the 4 th and 27 th of June and for 4 th of July, as shown in Figure 5.
Geostatistical generalized linear models with Poisson distributions and logarithmic link functions were used for the modeling of the data for farms and dates that presented some evidence of spatial pattern.Maximum likelihood parameter estimates were obtained by the MCMC algorithm and results are summarized in Table 3.A total of 120,000 iteration chains were obtained, with a burn in cycle of 20,000, keeping the first   of every 100 generated samples, amounting to a total of 1,000 samples.The obtained chain for each parameter was analyzed to verify the convergence of the MCMC algorithm.The estimates for the φ parameter reflects the spatial correlation, and for the case of an exponential correlation model the practical range of spatial dependence corresponds to three times the parameter value.The interpretation of the extent of the correlation also depends on the distances between points within the area, which vary from 10 to 170     Apparently there is some influence of the kind of the plantation in the neighborhood on the number of insects per leaf on plants.Estância Bela Vista had as the neighborhood an area already infested with thrips and showed the highest means for the number of insects per leaf and the greatest proportions of infested plants, whereas Fazenda São Paulo, isolated from other plantations of onion, was the one with the smallest proportion of infested plants, however increasing with time.This conjecture cannot be tested statistically with the available data, but can be considered for future studies.

CONCLUSIONS
The adopted methods allow testing for the presence of spatial patterns in the distributions of onion thrips using Mantel's randomization test, as well as suggest mechanisms for describing the processes by means of the geostatistical generalized linear model which provides a possible model for the data which also allows for covariates that could affect the insect distribution.The usage of such methods is new in this context and it is recommended that they should be considered for the detection and description of spatial patterns of pests in the field.The data analysis using the Mantel test supports the conjecture of the presence of spatial patterns, although not consistently for all dates which may be influenced by the high variability of the observations, with a possible effect of the imprecise recording of high values.The effects of non-measured covariates may also have generated heterogeneous conditions of sampling, therefore hiding spatial patterns.It is recommended that future samplings should be carried out including some pairs of observations with smaller spaces between them to allow a better descrip-tion of the spatial patterns.This is especially relevant when considering the limited mobility of this insect.

Figure 1 -
Figure 1-Localization of the stakes in each farm.

Figure 2 -
Figure 2 -Box-plots for the average number of insects per leaf.

Figure 3 -
Figure 3 -Dispersion graphs for the mean number of insects, Fazenda São Paulo, for dates 07/10; 07/24 and 07/31 (symbol sizes are proportional to the number of insects per leaf).

Figure 5 -
Figure 5 -Dispersion graphs for the mean number of insects, Sítio Novo II, for dates 06/04, 06/27 and 07/04 (symbol sizes are proportional to the number of insects per leaf).

Figure 4 -
Figure 4 -Dispersion graphs for the mean number of insects, Estância Bela Vista (symbol sizes are proportional to the number of insects per leaf).

Table 3 -
Point estimates and confidence intervals for the parameters of the geoestatistical model.

Table 2 -
Regression models for the distance matrices from the randomization test.