SPATIAL STATISTICS APPLIED TO SOYBEAN PRODUCTION DATA FROM PARANÁ STATE FOR 2003-04 TO 2009-10 CROP-YEARS

In the current study, we performed a soybean production spatial distribution analysis in Paraná State. Seven crop-year data, from 2003-04 to 2009-10, obtained from the Paraná Department of Agriculture and Supply (SEAB) were used to develop a Boxmap for each crop-year, show soybean production throughout this time interval. Moran’s index was used to measure spatial autocorrelation among municipalities at an aggregate level, while LISA index local correlation. For each index, different contiguity matrix and order were used and there was a significance level study. As a result, we have showed spatial relationship among cities regarding the production, which allowed the indication of high and low production clusters. Finally, identifying main soybeanproducing cities, what may provide supply chain members with information to strengthen the crop production in Paraná.


INTRODUCTION
Brazil is one of the greatest world powers in soybean production, second only to the United States.From 2000 to 2010, Brazilian production accounted for about a quarter of global output against 39% of U.S. (FAOSTAT, 2012).In the Brazilian crop-year of 2008/2009, a total of 57,165,500 tons of soybeans was produced; the largest production was 17,962,500 tons (31.42%) by Mato Grosso State, followed by Paraná with 9,509,700 tons (16.64%).Regarding the Brazilian production between 2003-04 and 2009-10, Paraná contributed with approximately 19% (CONAB, 2012).Soybean production is not uniform among Brazilian states or even among the municipalities within a state.Thus, it becomes important to study spatial distribution to improve classical methods including the use of new technologies such as remote sensing, geographic information systems (GISs) and spatial statistics, which have been used to solve problems and provide detailed information on agriculture and environment (DORIGO et al., 2007).
Exploratory Spatial Data Analysis (ESDA) use is justified since it aims to describe spatial distribution, patterns of spatial association (spatial clustering), varied spatial regimes or other spatial instabilities (non-stationary status) and identifying outliers (non-standard observations).So, the analysis consists of observing events to see if they have some kind of systematic pattern or are randomly distributed in space (MARANDUBA JÚNIOR & ALMEIDA, 2009).
Information technology advance is one of the factors responsible for spatial statistic expansion, owning to GIS data analysis expediency.There are several methods to conduct spatial analysis; however, the Global Index or Moran's Index (I) and the Local Index of Spatial Association (LISA) have stood out, once they use weight matrix (W) to define neighbours, which may determine whether there is autocorrelation within data (ANSELIN, 1995).
The aim of this study was to develop a method to characterize soybean production peaks throughout the crop-years of 2003-04 and 2009-10 analysing its spatial distribution for 399 municipalities within Paraná State; as a result, showing if there is some spatial correlation among them.

MATERIAL AND METHODS
The soybean production municipal data obtained from the Paraná Department of Agriculture and Supply (SEAB, 2010) for seven crop-years between 2003/2004 and 2009/2010 were used in this study.Figure 1 shows Paraná State location within Brazil and the studied cities.The GeoDa 0.9.5-i.software was used to perform spatial data exploratory analysis and the data viewing.According to LOPES (2005), data viewing consists of attribute presentation per area by using thematic or chloropletic maps, through which is possible to check influence and behaviour that each event has on others.This is the simplest form of identifying outliers.
To identify whether the data was homogeneous, a so-called Boxmap was built for each crop year.The purpose of these maps, which are based on the Boxplot graphic, is to identify possible overall outliers, that is, municipalities with low or high soybean production outlines.Non-standard values (outliers) are those outside the interval defined by the Upper Limit NOVAES et al. (2011) argue that in an exploratory spatial analysis, it is essential to check the features of spatial dependence, identifying these as attributes of a studied phenomenon correlated in space.LOPES (2005) states that three basic elements are found throughout ESDA techniques: spatial proximity matrix (W), vector of deviations (Z) and means vector (W z ).
A spatial proximity matrix is the basic element for spatial analyses, and is a useful tool for describing the spatial arrangement of municipalities.Given a set of n areas (A 1 , A 2 , A 3 ... A n ), a W nxn weight matrix is built, where each w ij element is the closeness between A i and A j .PIMENTEL & HADDAD (2004) state that each w ij cell in the matrix W nxn indicates a relation between region i and j (in a system of n regions).The cell w ij is zero when the regions are not neighbours; otherwise, it is 1, which means that the element w ij = 1 if the region A i is a A j neighbour, otherwise w ij = 0, for i ≠ j = 1, 2... n.
By using spatial proximity matrices to consider contiguity, it is important to agree the forms of neighbourhood.For this purpose, used criteria are based on queen, rook and bishop movements of a chess game (DALPOSSO, 2010) PIMENTEL & HADDAD (2004) define that a matrix constructed based on Queen criterion considers as neighbours regions with a common edge or border, as well as a common node, which means that according to this definition microregions are neighbours when they are tangent in space.According to the same authors, a matrix built from the Rook criterion considers as neighbour exclusively regions that are border areas.ANSELIN (2002) defines neighbour regions according to Bishop criterion, that is, whether they have any node in common.
CARVALHO YWATA & ALBUQUERQUE (2011) define that besides first order neighbourhood, higher orders can also be used.Second-order neighbourhood definition, for example, the polygons i and j will be neighbours if there is another polygon k, for which i and k are first order neighbours, and j and k are also first order neighbours.
According to neighbourhood simulation presented in Figure 2, CHIARINI (2008) reports that B cells are first order with respect to A, by "Rook criterion"; with this in mind, C and D cells are second order neighbours in relation to A and also first-order with respect to B. In the current study, we used Rook and Queen Criteria, considering first and second order neighbourhood for both aforementioned criteria.
In spatial statistics calculation, elements of the matrix W' row are multiplied by one scale data in order to the sum being normalized to 1.
A basic element used in ESDA techniques is the deviation vector Z, which represents a vector of n observations within the average.According to MARQUES (2009), in order to calculate that, we firstly have to calculate the average (μ) of the attribute under study (y 1 ... y n ) considering the n objects.Each element i of Z (z i ) is obtained by subtracting the average for the corresponding attribute value (z i = y iμ).
Another basic element is the vector of weighted averages (W z ), which results from multiplying the transpose of the deviation vector (Z T ) by the normalized proximity matrix (W') (MARCONATO et al, 2012), obtaining a matrix W z where each W zi element contains a corresponding value of deviation average of neighbour areas to I, featuring a spatial moving average (OZON, 2011).
ESDA techniques are presented as indices, which measure spatial association (Moran's Index), scatter plot (Moran Scatterplot) and maps (Boxmap).According to NOVAES et al. (2011) Moran's Index is an index that expresses spatial autocorrelation, indicating grouped areas (clusters), whose attributes are spatially similar.TEIXEIRA et al. (2008) stated that spatial weight matrix when normalized in line (elements sum is 1) provides Moran's I value as in equation 1, where Z T is the transposed vector of deviations.Equation 2shows deviation vector calculation and equation 3 how to calculate the weighted average.2008) defines Moran's I index as a measurement whose result indicates whether data are distributed randomly in space.Moran's coefficient (I) has an expected value E(I), which is defined by the following equation E(I) = -[1/(n-1)], where n indicates the number of polygons within the studied area.It should be noted that unlike an ordinary correlation coefficient, this statistic is not centred at zero, and therefore, such statistics vary between -1 and +1 (MONTENEGRO & BETARELLI JUNIOR, 2008).With regard to testing Moran's Index significance, a reference distribution was generated using random production values and calculating a new output value for each permutation, enabling the following test hypotheses: H 0 : I = 0 which is the null hypothesis of no spatial correlation versus an alternative hypothesis H 1 : I ≠ 0. If the descriptive level (p-value) is less than the significance level set (usually a probability of 0.05), the null hypothesis of no spatial correlation is rejected.
Otherwise, if the descriptive level is greater than or equal to the significance level, the null hypothesis cannot be rejected (DALPOSSO, 2010).
Moran's I index only returns a numeric value, which can easily lead to interpretation mistakes.An alternative to help index value interpretation is to build Moran Scatter Plots.MONTENEGRO & BETARELLI JUNIOR (2008) define the Moran Scatter Plot as a way to visualise overall indicators of spatial autocorrelation, revealing spatial lag of the aimed variable (i.e.average attribute of neighbourhood) on vertical and horizontal axes.Besides global measure of spatial linear association, this diagram is divided into four quadrants, namely: High-High (HH), Low-Low (LL), High-Low (HL) and Low-High (LH).Moran Scatter Plots are four-quadrant graphics representing relationship between vector Z values and weighted means (W z ).
The HH quadrant represents positive values for Z and W z , i.e. areas with high attribute values surrounded by high value areas.The LL quadrant represents negative values for Z and W z , that is, areas with low attribute value next to low ones.The LH quadrant represents areas with negative values for Z and positive for W z ; these low value areas are near high value areas.The HL quadrant, which represents positive values for deviation vector Z and negative for weighted means, express areas with high values for a given attribute close to low value neighbours.
In this graphic, Moran's index (I) represents a regression line slope, that means this value indicates line inclination from W z to Z. MONASTÉRIO et al. (2008) emphasize that Moran Scatter Plot represents a variable standardised value for each unit within the abscissa and ordinate axes.One way to avoid a wrong conclusion about the existing spatial autocorrelation, based only on Moran Index (I), is to break down the overall index due to its local constituents.According to PREDEBON et al. (2010), the local Moran's index (LISA), which is an overall measure break-up, produces specific values for each geographical area indicating statistical significance of cluster formation in similar geographical areas or isolated points for a particular attribute.ANSELIN (1995) shows that a LISA aims to meet two goals: enabling significant patterns of spatial association and be a significant break-up of overall index.
The LISA index is represented by equation ( 4 where, σ 2 is the distribution variance of deviation values. Once statistical significance of local Moran's index has been determined, it would be useful to generate a map showing regions with local correlation that are significantly different from the other data.These regions are seen as "pockets" of non-stationary status, being areas with their own spatial dynamics and deserving detailed analysis (DALPOSSO, 2010).

RESULTS AND DISCUSSION
Figure 3 shows seven Boxmaps related to soybean production by municipality throughout the crop-years of: 2003/2004, 2004/2005, 2005/2006, 2006/2007, 2007/2008, 2008/2009 and 2009/2010.According to Figure 3, there have not been major changes in production quantity for municipalities during studied period.There are large production outliers for all crop-years.The cities Cascavel, Toledo, Assis Chateaubriand (Western Paraná) and Tibagi, Castro, Ponta Grossa (Mideastern Paraná) are among outliers with the highest productions for the seven crop-years.In many of Northwestern Paraná cities and almost the entire Eastern region (including the metropolitan region of the Greater Curitiba) null or negligible production dominate, although not configuring an outlier on low type in the Boxmap.This is justified because in Northwest most agricultural areas are intended for sugar cane production and livestock, while in East, the Atlantic forest dominates.Table 1 shows the values presented in Figura 3 captions, which presents the Boxmap maps.These values describe intervals within the BoxPlot, quartile (percentile) division and upper and lower limit values.Lower Limit values are negative, being why cities that presents null production have no low-type outlier.
Tables 2 and 3 show Moran Index (I) values, which were obtained using different spatial proximity matrices (Queen and Rook) of first and second order respectively.The expected value (E (I) = (-[1/(n-1)]) for Moran's Index, in both cases, considering 399 municipalities is -0.0025.Considering the first-order spatial proximity matrix, the Moran's Indexes (I) are shown in Table 2.All studied Moran's Index (I) values were greater than E (I)(Table 2).Through this, a positive spatial correlation can be characterized, i.e. cities with high (low) production have neighbours with high (low) production.Both criteria (rook and queen) have similar Moran's I values.In case of analysing second-order neighbour matrix, it can be seen higher Moran's I values than expected (E(I)), which characterises a positive spatial autocorrelation among municipalities for soybean production levels.When analysing Moran's index for both criteria (Queen and Rook) and both neighbourhood matrices (W (1) and W (2) ), it is observed greater difference when we use different neighbourhood orders.Spatial autocorrelation is lower by using W (2) matrix (Table 3), when compared with W 1 (Table 2).
Moran Scatter Plots are shown in Figures 4 and 5 to assist in the interpretation of overall index (I) that are presented in Tables 2 and 3.As Moran's index represents the slope of regression line, a scatter plot allows evaluating the production value in a given municipality and the neighbour average.  2 and 3) by means of Moran Scatter Plots.There is higher concentration of municipalities (points) in the AA (high-high) and BB (low-low) quadrants, which explains Moran's index positive value.There are also, in both figures, cities in quadrants AB and BA, which represent negative spatial association, being not only represented in numerical value of overall index (Tables 2 and 3).There are larger values for neighbour mean deviations (W zi ) using the matrix W (1) , compared with W (2) .That represents lower line slopes presented in matrix W (1) plots contrasting with matrix W (2) .This fact makes Moran's index values (I) lower for second-order matrices, i.e. defining a neighbourhood of higher order for both criteria one can present decreasing spatial autocorrelation values.
Figure 4 and 5 shows that criterion used to determine neighbours with Moran Scatter Plots, either Queen or Rook, promoted no large variations in obtained results.According to Tables 2 and  3, Moran's index values (I) for Rook Criterion are higher compared with those from Queen.As a result, we conclude that Rook Criterion represents higher spatial correlation concerning soybean production in each city.
Evaluating values over the crop-years, there was little change in Moran Scatter Plots for each criterion and neighbourhood order.It can also be highlighted little change in spatial association orders, concluding that high-production municipalities maintained the large-scale production, while those with little or no production in same way kept the low or absent production level.
We used LISA index to analyse production spatial correlation value for each municipality.LISA CLUSTERS MAP (LCM) were prepared taking into account both criteria, as well as neighbourhood orders for each crop-year.By the end, clusters with low and high production were determined with only 5% significance indexes for studied crop-years.
Figure 6 shows LCM for Queen Criterion throughout studied years and spatial proximity matrixes of first and second order.Considering matrix W (1) use, the formation of large agglomerates (clusters) AA-type must be detached, i.e. high-production municipalities surrounded by highproduction cities.Thus, we might highlight two major centres: the West of Paraná State, especially municipalities located around Toledo and Cascavel cities, and the East Central Paraná State, in municipalities near Castro and Tibagi cities.
Differently from the aforementioned, when analysing LCM for second order matrix, there was no formation of AA clusters in locations close to Castro and Tibagi cities.It also noted that for 2008/2009 crop-year, using matrix W (2) , there was a minor AA cluster formation.As observed in able 1, this crop-year had the lowest soybean production average out of all studied crop-years.
The lower production related to 2008/2009 crop year may be related to droughts during November and December in Paraná State.This weather event had more negative effects due to much of planted soybean is from early cycle and early planting (late September and early October), therefore crop was at flowering and fruit production stage when drought happened; causing yield loss (CONAB, 2009).
Average yield for Paraná in 2008/2009 was approximately 14% lower than the other studied crop-years (CONAB, 2012).This yield reduction brought decreased grain production, even for larger areas of 91,900 ha according to CONAB (2009).Such disturbances did not restrict clusters formation for first-order neighbours as can be seen in Figure 6.
There was also two BB-type large cluster formations, where similar production level municipalities surround municipalities with little or no production, and the two clusters are located in coastal and Northwest regions.
Figure 7 represents LCM obtained when using Rook Criterion.When compared to Figure 6, concerning first order neighbours, we can observe clusters in same regions, indicating that neighbourhood criterion did not interfere in their formation, even for AA as BB cluster types.
Considering the matrix W (2) , we can observe in Figure 7 that there is no high production cluster formation for Castro and Tibagi regions, which was already noted in Figure 6   Toledo and Cascavel region formed high-production clusters in both criteria and orders matching with the findings of MERCANTE et al. (2010) who studied soybean production in some municipalities of same region once it is a great soybean area in terms of production.In the case of Castro and Tibagi regions, there was a cluster of high production when considering first order for both criteria.
One of the main Low-High type outliers is observed in Telêmaco Borba municipality.This probably occurs due to its little aptitude for grain production, since the city is a main pulp and paper manufacturer, and according to SILVA et al. (2011) the city establishment was grounded to this sector.Therefore, there was no formation of High-Low outlier type in the region.
Regardless of criterion, there is a greater spatial autocorrelation when first order neighbourhood is adopted observing Global Index or formed clusters.This assertive corroborates with the first law of geography: "in the world, all things are related; however, closest things relate more intensely than those more distant" (TOBLER, 1970).
There are several factors affecting spatial distribution of soybean production such as soil type, management systems, weather etc.In compliance with weather, MARIANO (2010) observed that even in the light of important technologies, climate is still largely responsible for production and yield losses.
Overall DERAL (2003) states that Paraná agriculture is performed mostly on good quality soils, which are about 90%, of Oxisol group or well-structured Terra Roxa, having high level of natural fertility.
Soybean cropping has been responsible for mechanised plantations growth in Paraná State (DERAL, 2003).This means that topography acts as hindrance for plantation development, as according to PEREIRA (2002), soils with more than 20% slopes are inappropriate for mechanisation at any time of year, and even for animal traction implement use.
These data enabled to understand how soybean production is spatially distributed along Paraná State, differentiating high-yield and without production regions.This kind of information is important for production flow to plot strategies that might serve not only for grain storage system, but also for production outlet (exportation).
Major soybean production centres has great importance to provide the best places to install soybean transformation industries such as animal feed, oils, milk and meat.
It is noteworthy that there are several factors affecting spatial distribution of agricultural production, such as soil types, management system, and weather among others.In this way, we highlight the great importance of undergoing studies in this line to contemplate more information layers (factors and aspects) that might influence, at anyway, spatial distribution of agricultural production.

CONCLUSIONS
A spatial profile for soybean production in Paraná State was built by using ESDA techniques, specifically with Moran's Index (I) use as global and local, varying neighbourhood matrix order and criterion.

FIGURE 1 .
FIGURE 1. Location of Paraná state and studied cities.
PEROBELLI et al. (2007) indicate that the fact of Moran's index providing a single value of association as a measure for whole data may conceal spatial autocorrelation local patterns, with three different possible situations.Firstly, it involves the indication of insignificant Moran's I values, not always showing significant spatial autocorrelation.Secondly, when the Moran's I concealing local negative values and insignificant spatial autocorrelation.At third place, the evidence of negative global spatial autocorrelation can accommodate positive local spatial autocorrelation for certain data groups.

FIGURE 3 .
FIGURE 3. Production maps in tons (t) per municipality during all studied crop-years.

Figures 4
Figures 4 and 5 exemplify the Moran's index values obtained by Queen and Rook criteria with W (1) and W (2) matrices (shown in Tables2 and 3) by means of Moran Scatter Plots.There is higher concentration of municipalities (points) in the AA (high-high) and BB (low-low) quadrants, which explains Moran's index positive value.There are also, in both figures, cities in quadrants AB and BA, which represent negative spatial association, being not only represented in numerical value of overall index (Tables2 and 3).

FIGURE 5 .
FIGURE 5. Moran scatter plot considering Rook Criterion and first and second order neighbours.
for same neighbourhood order, with smaller AA cluster formations in 2008/2009 crop-year, what can be explained by disturbances previously mentioned.

FIGURE 7 .
FIGURE 7. LCM considering the Rook Criterion and first and second order neighbours.

TABLE 1 .
Soybean production Boxmap values for each crop-year in Paraná State.

TABLE 2 .
Moran's Index values (I) obtained by the queen and rook criteria for the matrix W(1).

TABLE 3 .
Moran's Index: Values (I) obtained by the queen and rook criteria for the matrix W(2).