Autoregressive modelling of species richness in the Brazilian Cerrado

Spatial autocorrelation is the lack of independence between pairs of observations at given distances within a geographical space, a phenomenon commonly found in ecological data. Taking into account spatial autocorrelation when evaluating problems in geographical ecology, including gradients in species richness, is important to describe both the spatial structure in data and to correct the bias in Type I errors of standard statistical analyses. However, to effectively solve these problems it is necessary to establish the best way to incorporate the spatial structure to be used in the models. In this paper, we applied autoregressive models based on different types of connections and distances between 181 cells covering the Cerrado region of Central Brazil to study the spatial variation in mammal and bird species richness across the biome. Spatial structure was stronger for birds than for mammals, with R values ranging from 0.77 to 0.94 for mammals and from 0.77 to 0.97 for birds, for models based on different definitions of spatial structures. According to the Akaike Information Criterion (AIC), the best autoregressive model was obtained by using the rook connection. In general, these results furnish guidelines for future modelling of species richness patterns in relation to environmental predictors and other variables expressing human occupation in the biome.


Introduction
Autocorrelation is the lack of independence between pairs of observations at given distances in time or space and is commonly found in ecological dataset (Legendre, 1993;Legendre and Legendre, 1998;Fortin and Dale, 2005).Many recent papers have discussed the importance of spatial autocorrelation when evaluating problems in geographical ecology, including gradients in species richness (Badgley and Fox, 2000;Lennon, 2000;Jetz and Rahbek, 2001;Rahbek and Graves, 2001;Diniz-Filho et al., 2003;Tognelli and Kelt, 2004).These respectively.Data from the literature (Marinho-Filho et al., 2002;Eisenberg and Redford, 1999;Emons, 1990;Embrapa, 2002;Fonseca et al., 1996) and specifically the following biodiversity websites were used to map the species: The Revista Brasileira de Zoologia (RBZ) site, SpeciesLink site, The Animal Diversity Web site (The University of Michigan Museum of Zoology) and the Site of the Global Biodiversity Information Facility Data Portal (GBIF) (a detailed species list and references are available from the authors upon request).A binary matrix was constructed by recording the geographic ranges of which species overlapped each cell, and species richness was calculated by summing the species present in the cells.Geographical coordinates of cell centroids (latitude and longitude) were also obtained for further spatial analyses.

Spatial description
Spatial autocorrelation measures the similarity between samples for a given variable as a function of spatial distance (see Legendre and Legendre, 1998).For quantitative variables, such as species richness, the Moran's I coefficient is the most commonly used coefficient in univariate autocorrelation analyses and is given by: where n is the number of cells, y i and y j y are the values of the species richness in cells i and j, y is the average of y and w ij is an element of the matrix W. In this matrix, papers show that autocorrelation analyses can be useful to provide a more detailed description of spatial structure in species richness data and to allow a better understanding of ecological processes driving richness (Legendre, 1993;Diniz-Filho et al., 2003).At the same time, it is now widely recognized that testing statistical hypotheses using standard methods (e.g., ANOVA, correlation and regression) in the presence of spatial autocorrelation will cause downward bias in the standard errors and, consequently, Type I error rates may be strongly inflated (Haining, 1990(Haining, , 2003;;Cressie, 1993;Legendre, 1993;Fortin and Dale, 2005).
Description of spatial patterns in data using correlograms and variograms is now straightforward (see Legendre and Legendre, 1998;Fortin and Dale, 2005).On the other hand, incorporating the autocorrelation structure into modelling, in a regression framework, may be a more complicated task.Autocorrelation analysis must be based on the spatial relationship between spatial units, but this must be established by taking into account the relationship between the processes underlying diversity and the geographic distances or connectivity among the spatial units analysed.For example, in a stream network, it is important to consider the links between units along the river flows and to take into account ecological barriers (Ganio et al., 2005).Formally, these alternative propositions must be codified into a weighting matrix W.However, for broad-scale patterns in species richness in terrestrial systems, it is difficult to establish these connections assuming spatial dynamics of ecological or biogeographical processes.Empirical evaluation of alternative spatial modelling strategies may be an initial solution, especially considering that models can be sensitive to misspecifications in the W matrix (Cressie, 1993).
In this paper, we evaluated the spatial patterns of mammal and bird species richness in the Brazilian Cerrado.Our goal is to discuss how changes in the definition of the spatial relationship among spatial units (i.e., grid cells) affect the statistical performance of the autoregressive models describing species richness.This may provide a basis for further analyses investigating the relationship between the environmental predictors and richness and consequently allow a better evaluation of the processes driving the spatial patterns in species richness.

Data
The extents of occurrence of the 138 non-volant mammals species (Marinho-Filho et al., 2002) and 751 birds species (Ridgely andTudor, 1989, 1994;del Hoyo et al., 1992;1994;1996;1997;1999;2001;2002;Junniper and Parr, 1998;Silva, 1995) found in the Brazilian Cerrado were mapped with a spatial resolution of 1º grid cell, with a total of 181 cells covering the Cerrado Biome (Figure 1).The gathered information on the mammals included 8 orders: Didelphimorphia, Xenarthra, Primates, Carnivora, Rodentia, Perissodactyla, Artiodactyla and Lagomorpha, We also used seven different criteria to create binary (0 or 1) matrices W, indicating whether pairs of localities are connected or not.We used the Delaunay triangulation, Gabriel, the Minimum Spanning Tree and the Relative Neighbour networks, and established rook and queen connections among the cell centroids (Figure 2; see also Legendre and Legendre, 1998;Fortin and Dale, 2005, for details).These connections are built using different criteria to establish the links among the cells.In short, according to the Delaunay Triangulation Criterion, for a triplet of points (i.e., cell centroids) to be connected, a circle that circumscribes them (i.e. the circle passing through the three points) must include no other points, whereas in Gabriel connections, two points are connected if the circle in which the diameter is the distance between the points includes no other points.According to the Relative Neighborhood Criterion, two points are connected if, and only if, there is no other points lying on the intersection between the two circles centered in the two points, whereas in the Minimum Spanning Tree all points are inter connected so that the length of this connection is the minimum possible.Rook and Queen connections are designed to match 'chess' movements (see Figure 2).
The autoregressive models based on these 15 different matrices W (6 binary connectives and distance-based using 9 values of ) were compared using different approaches, for mammal and bird richness.The R 2 values of the autoregressive model indicate the ability of each model to explain spatial structure in richness, whereas an autocorrelation analysis base on Moran's I in the term indicates the effectiveness in taking autocorrelation structure into account.Akaike information criterion (AIC) was also used to select the best model, within an information theory framework (see Burham and Anderson, 2002, for details).For each model, AIC corrected for small samples was computed as: where n is the number of cells, K is the number of parameters in the model and 2 is the variance of the residuals of each regression model.The variance of the residuals was used here as a proxy for the likelihood of the model given the data (Haining, 2003), whereas the term (n/n -K -1) is the small sample correction term and tends to one as n increases.We compared the AIC values of each model using AIC, which is the difference between AIC of each model and the minimum AIC found.A value higher than 10 indicates that a model has a poor fit relative to the best model, whereas a value less than 3 indicates that a model is equivalent to the best model (with the lowest AIC); model.The AIC values were also used to compute Akaike's weighting of each model (w), which provides evidence that the model is actually the best explanatory model.The values of w are usually standardized by their sum among all models evaluated, so they are dependent on the set of models used and are given by: w ij = 1 if the pair i,j of cells is within a given distance class interval (indicating cells that are "connected" in this class), and w ij = 0 otherwise.S indicates the number of entries (connections) in the W matrix.The value expected under the null hypothesis of absence of spatial autocorrelation is -1/(n-1).Detailed descriptions of the computations of the standard error of this coefficient are given in Legendre and Legendre (1998).
Moran's I usually varies between -1.0 and 1.0, for maximum negative and positive autocorrelation, respectively.Non-zero values of Moran's I indicate that richness values in cells connected at a given geographic distance are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of cells.The geographic distances among cell centroids can be partitioned into discrete classes, creating then successive W matrices and allowing computation of different Moran's I values for the same variable.This allows one to evaluate the patterns of autocorrelation as a function of spatial distance, in a graph called spatial correlogram, which furnishes a spatial description of the species richness.The number and definition of distance classes to be used in the correlograms are arbitrary, but a general methodological criterion is to try to maximise the similarity in the S values (number of connections) for the different Moran's I coefficients, so that they are more comparable.In this paper, correlograms were based on 15 distance classes (see Figure 4).

Spatial modelling
Spatial autocorrelation in mammal and bird richness (y) was modelled by an autoregressive model of the form: where W is the row-standardized weighting matrix (not decomposed as in the correlogram), is the autoregressive parameter and is the error vector.This model must be fitted by maximum likelihood procedures (Haining, 1990;2003;Cressie, 2003).Squared correlation between y and the estimated value ( Wy) furnishes the pseudo-R 2 of the model, expressing the proportion of variance in Y that is explained by an autoregressive process.
The autoregressive model above was fitted using various W matrices, derived from alternative ways to establish relationships between the spatial units (cells in the Cerrado grid).First, geographic distances among the cell centroids was used, and values in the matrix W were obtained using inverse-powered functions, given by: where D ij is the geographic distance between centroids of cells i and j .Values of ranging from 1 to 5, with steps of 0.

Results
Both mammal and bird species richness show a clear spatial pattern in the Brazilian Cerrado, with higher richness concentrated in the south-eastern region of the biome, decreasing toward the north (Figure 3).High count.In principle, models based on binary connections are better than models based on the inverse of geographic distances.
The AIC analysis allowed a more effective comparison among these alternative models (Table 1).In both mammals and birds, there is no model with AIC smaller than 3, indicating that, in principle, there is a unique solution for modelling richness.The best models were obtained using the rook connection (Figure 2), and the standardized Akaike weights suggest a chance higher than 99.9% that these are the best models among those tested.They yield R 2 values of 0.938 and 0.972 for mammals and birds, respectively.Coherent with patterns revealed in the spatial correlograms, birds display stronger spatial structure than mammals, with higher fit of autoregressive models.
However, it is interesting to note that Moran's I in the best model residuals for mammals displays a relatively high negative autocorrelation value -0.193, so a slight over-correction of the spatial structure probably occurred in this case (see Griffith, 2002).For both mammals and birds, the second best models were based on the Gabriel network (Figure 2), although AIC is slightly larger than 10, indicating a low chance that this is the best model and, for mammals, the residual autocorrelation is still relatively high (-0.143).

Discussion
Different forms of autoregressive models have been recently applied in geographical ecology (Lichstein et al., 2002;Kelt and Tognelli, 2004;Fortin and Dale, 2005).These models have been mainly used as a way to take the spatial structure into account in data and, at the same time, to evaluate how different environmental predictors are related to spatial variations in species richness.However, in most of these papers, researchers assume a given form of matrix W and do not explore alternative scenarios for the relationship among spatial units and the weighting of autoregressive model.
Our results show that the autoregressive model, used here only to analyse spatial structure in richness, is rather sensitive to variations in the definition of W and, con-richness values also appear in the western region of the biome, but this patch is clearer for mammals.Indeed, spatial correlogram confirm this strong spatial structure, with Moran's I coefficients large in the first distance class (0-245 km) and decreasing monotonically with the increasing of geographic distances (Figure 4).
Autoregressive modelling based on the different W matrices (Table 1) reveals a large variation in model fit, both for mammals and birds.As expected, connections based on the minimum spanning tree were not adequate and showed a very poor fit, and will be not considered further.The R 2 values ranged from 0.77 to 0.94 for mammals and from 0.77 to 0.97 for birds.Relatively high values of Moran's I (i.e., I 0.1) remain in the residuals of a few models.In these cases, modelling was not effective in taking the autocorrelation structure into ac-  it is important to consider that a negative autocorrelation in the residuals remains, so a more careful evaluation should be performed.In practice, the consequence is that the effect of environmental predictors would be underestimated due to the overestimation of the spatial component in species richness.Of course, more complex models could be tested using alternative scenarios, for example taking into account the different biogeographical or ecological boundaries based on vegetation types or historical barriers, but this is beyond the scope of this paper.
We provide here guidelines for a more effective modelling of the richness patterns of mammals and birds sequently, using these models to relate richness to environmental predictors requires more effort around the definition of W.
We recognize that it is difficult to find theoretical arguments to support the use of a given W matrix to evaluate richness patterns at broad scales, so empirical evaluations, as performed here, are important.In our analysis of the mammals and birds in the Cerrado, AIC-based model selection was which effective in establishing a single model as the best one, based on a rook connection among cells.A model based on Gabriel connections followed this, according to AIC.For mammals, although AIC selected the rook connection as the best model, Table 1.Results of the autoregressive models including the following statistics: Coefficient of determination (R 2 ), autoregressive parameter ( ), standard error (SE), number of parameters (K), Akaike Information Criterion (AIC), residuals' autocorrelation (as estimated by the Moran's I coefficient), difference between AIC of each model and the minimum AIC found ( (AIC)), Akaike's weighting of each model (w i ) and standardized w i (w i /w t in the Brazilian Cerrado.In some sense, it matches the theoretical evaluation by Griffith (1996), who gave some advice on choosing between alternative W matrices.For predictive purposes, using any of the models discussed here is better than assuming that no spatial autocorrelation exists, especially considering the relative high R 2 values of all autoregressive models.This indicates that ignoring spatial components using ordinary regression models (OLS) probably will furnish biased results.Another important guideline is the preference of using low order (= short distance) expressions of spatial structure, in which close localities are more heavily weighted, when compared to distant ones.Indeed, our results show that the AIC values based on connections are usually higher than those obtained with the inverse distances functions.Also, there is a perfect decrease in AIC values when increasing the values.However, Griffith (1996) indicated that over-specification of the connections in the model (in this case, generating high negative autocorrelation in the residuals of the model) is worse than underspecification.Thus, although rook connections were selected by AIC as the best model, followed by Gabriel connections, future modelling must be aware of inflated negative spatial structures in residuals and, eventually, further models may be tested if more complex autoregressive models are to be obtained (i.e., by adding environmental predictors -see below).Finally, overall spatial patterns of species richness in the Cerrado are not well established for most groups, but there seems to be a general decrease from south to north, that is more accentuated in amphibians than the decrease in bird and mammal richness shown in this paper (see Diniz-Filho et al., 2005a,b).There is currently a consensus that climatic variables, mainly the effects of available energy, water and increasing productivity, explain most of the variation in species richness of endothermic organisms at broad spatial scales (Allen et al., 2002;Hawkins et al., 2003;Currie et al., 2004).At the same time, correlations between richness and components of human occupation can be a consequence of this same energetic response (Balmford et al., 2001; but see Araújo, 2003) or may reflect patterns of knowledge of biodiversity (Diniz-Filho et al., 2005a).In both cases, the pattern has important consequences for biodiversity conservation.Although overall geographic distributions of mammals and birds are relatively well known, their patterns in the Brazilian Cerrado might still reflect knowledge effects, as previously shown for anurans (Diniz-Filho et al., 2005a), due to many reasons.On the one hand, geographic ranges used here, expressed as extents of occurrence, are overestimated because of the high rate of habitat loss in the biome.On the other hand, there may be biases in the estimation of these geographic ranges (and even in the very species richness), mainly because of the paucity of knowledge in the northern part of the biome.Despite these problems, the results obtained here are important because they furnish clear guidelines for future modelling of richness patterns in relation to envi-

Figure 1 .
Figure 1.Distribution of the 181 cells that were used to analyse the spatial variation of mammal and bird species richness in the Cerrado Biome.

Figure 2 .
Figure 2. Networks connecting the center of the cells which were used to estimate the different autoregressive models.