COMPUTER-ASSISTED CARTOGRAPHY USING TOPOGRAPHIC PROPERTIES: PRECISION AND ACCURACY OF LOCAL SOIL MAPS IN CENTRAL MEXICO(1)

Map units directly related to properties of soil-landscape are generated by local soil classes. Therefore to take into consideration the knowledge of farmers is essential to automate the procedure. The aim of this study was to map local soil classes by computer-assisted cartography (CAC), using several combinations of topographic properties produced by GIS (digital elevation model, aspect, slope, and profile curvature). A decision tree was used to find the number of topographic properties required for digital cartography of the local soil classes. The maps produced were evaluated based on the attributes of map quality defined as precision and accuracy of the CAC-based maps. The evaluation was carried out in Central Mexico using three maps of local soil classes with contrasting landscape and climatic conditions (desert, temperate, and tropical). In the three areas the precision (56 %) of the CAC maps based on elevation as topographical feature was higher than when based on slope, aspect and profile curvature. The accuracy of the maps (boundary locations) was however low (33 %), in other words, further research is required to improve this indicator.

Index terms: decision tree, digital elevation model, map quality.
(1) Received for publication in March 2010 and approved in January 2011. (

INTRODUCTION
Soil maps generated with the photo-interpretation technique are designed as landscape units, based on Jenny's paradigm of five soil-forming factors, which associates soil variation with: climate, vegetation, topography, parent material, and time (Hudson, 1992). In this case, the photo-interpreter defines the elementary mapping units, based on the interaction of these factors, which govern the occurrence of soils in the forms of the landscape (Bui, 2004). However, another methodology called Digital Soil Mapping (DSM) used in soil science (McBratney et al., 2003) or predictive soil mapping (Scull et al., 2003) is the computer-assisted production of digital maps of soil class and soil properties. DSM makes use of technological advances, including GPS receivers for remote sensing, computational tools, considering geostatistical interpolation and inference algorithms, digital elevation models, and data mining. Semiautomated techniques and technologies are used to acquire, process and visualize information on soils and auxiliary aspects, reducing the costs (Hengl & Rossiter, 2003). Several methods for DSM have been developed, for example: SoLIM (Zhu et al., 2001;Qi & Zhu, 2003) or scorpan-SSPFe (soil spatial prediction function with spatially autocorrelated errors) (McBratney et al., 2003). These models are based on the five soil-forming factors associated with soil variation: climate, vegetation, topography, parent material, and time, and depend mainly on topographical features.
The most commonly used topographical features in DSM are elevation, slope, aspect, and profile curvature. The precision of the generated maps ranges from 50 to 88 % (Lagacherie & Holmes, 1997;Dobos et al., 2001;Moran & Bui, 2002;Hengl & Rossiter, 2003;Peng et al., 2003;Qi & Zhu, 2003;Schmidt & Hewitt, 2004;Scull et al., 2005;Giasson et al., 2006;Smith et al., 2006;Qi et al., 2006;Ziadat, 2007;Figueiredo et al., 2008;Schmidt et al., 2008;Hansen et al., 2009;Ballabio, 2010;Behrens et al., 2010). These maps produced by DSM are based on scientific data by the extrapolation of soil properties, but do not take local soil knowledge into consideration. Alternatively, local soil classification has been the central focus of studies undertaken worldwide to understand farmers´ local knowledge about their soils and the majority of studies include the cartography of local soil classes (Barrera-Bassols & Zink, 2000;Niemeijer & Mazzucato, 2003;Ortiz-Solorio et al., 2005). The local soil maps were generated based on the local knowledge of farmers that identifies more soillandscape relationships than scientific procedures (Lleverino et al., 2002;Krasilnikov & Tabor, 2003). The local soil classes are identified by the construction of cartographic units. These units are related to direct observations of the farmers on soil-landscape relations. In addition, the maps are strongly related to features considered important for the farmers, e.g., workability, yield, fruit quality, vegetation, and others (Ortiz-Solorio et al., 2001).
The objective of this study was to evaluate the precision and accuracy of maps produced with computer-assisted cartography (CAC) in three areas of Mexico with contrasting climate and landscapes. The map quality was estimated based on precision and accuracy (Brown, 1988). Precision refers to the dispersion of the soil properties from the central concept or typical profile in the case of the cartographic unit. Accuracy represents the correct location of the soil boundaries. CAC -based maps for these three areas were produced using the following topographic properties: elevation, slope, aspect, and profile curvature, which are commonly used together, but in this study the features were also considered individually.

Maps of local soil classes and topographic properties
The soil class or ground truth maps (Figure 1) published by Martinez et al. (2003) for the arid region, by Pajaro & Ortiz (1987) for the temperate region, and by Cruz Cadenas et al. (2008) for the tropical region were used, generated with the methodology of Ortiz et al. (1990). The maps were digitalized in ArcView 8.1 of ESRI ® (Shaner & Wrightsell, 2000) and imported into IDRISI ® (Eastman, 2006). The digital elevation model (DEM) was extracted from the download system of the Continuo de Elevaciones Mexicano (INEGI, 2007) and resized to each study area. The pixel size was 28.5 x 28.5 m to enhance precision for automated cartography of local soil maps using the nearest neighbor technique (Smith et al., 2006). From the DEM were used to extract slope, aspect, profile curvature, flow, analytical hillshading, convergence index, wetness index and catchment area were generated using IDRISI® and SAGA System (Cimmery, 2007).

Pre-processing of the topographic properties
Before using the nine topographic properties (TP) to run a classification tree analysis, the problem or multicolinearity effect of information overlap in the predictors had to be corrected. The multinomial logistic regression was used to evaluate multicolinearity (Debella-Gilo & Etzelmüller, 2009, Kempen et al., 2009).
Regression coefficients were fitted for each local soil class by using the single-hidden-layer neural network (Venables & Ripley, 2002). Two models were evaluated, one with the nine TP and one with four TP (DEM, aspect, profile curvature and slope) then a X 2 test was applied to detect whether they differ from each other.

Training sites, classifier and input data
Polygons were projected onto the maps of local soil classes to be used as training sites. The IDRISI ® decision tree was used as a classifier.Decision tree begin from a root and training sites. After pixels are separated by binary rules. If the separated pixels belong to a single class, they are combined to form a layer. If the separation contains pixels from different classes, an internode is fixed and the separation process continues until classification is finished (Quinlan, 1988). The combination of the TP used in the CAC is presented in Table 1. To obtain the maps 6, 7, 8, and 9, a single topographical property (DEM, slope, aspect, and profile curvature, respectively) was used at a time. The combination of the mentioned attributes was used in the three study regions.

Evaluation of precision and accuracy
For an evaluation of precision, the Kappa index (Congalton, 1988) and ordinary nonparametric bootstrap were used to calculate the confidence interval (DiCiccio & Efron, 1996). A representative sample for each whole area of the local soil class (ground truth) maps was generated with 1 % of the total pixels. The spatial sampling was randomly stratified, as proposed by François et al. (2003). These maps of sample points were crossed with the 27 CACbased maps and with the local soil maps of the three studied areas. The results were 27 grids of the sample points (1 %) with the information of the CAC-based maps from TP, and three grids of sample points for the local soil classes. A review of the Kappa index was performed using the R system (Venables & Smith, 2010) comparing sampling points of ground truth with each set of nine maps of the three regions.
The map accuracy was evaluated according to Burgess & Webster (1984) as follows:first, a grid of 1 cm 2 was projected according to the scale of each map. Then, the distance between two consecutive boundaries in a line was determined in meters. In this way, the distance in the two directions North-South and East-West was measured and multiplied by 0.52 to generate the optimum distance; then the total longitude of the boundaries was divided by the mean optimum distance to produce the optimum sample size (number of points). The number of successes points on boundaries was registered. The binomial test was applied to contrast the total accuracy of the maps using the R system (Venables & Smith, 2010).

RESULTS AND DISCUSSION
The X 2 test showed no difference between the models with nine or four TP in most study regions. It is therefore possible to use the DEM, aspect, profile curvature and slope in classification tree analysis (Table 2). Debella-Gilo & Etzelmüller (2009) argued that elevation, aspect and slope are signiûcantly correlated with soil spatial distribution because some terrain attributes are related with radiation, temperature, moisture and flow of materials, which in turn control pedogenesis.

Evaluation of Precision (Kappa index)
The maps with highest precision in the arid, temperate, and tropical regions were the maps 1, 2, 3 4, and 6, all of which included the DEM (Figure 2). The precision of maps based on four attributes for all areas was not better according to the statistical test than map 6 with only DEM. This means that the CAC-based maps can be produced using only the DEM   Map 1, 2,  3, 4, 5, 6, 7, 8, and 9 see Table 1.
regions. Similar to the results obtained in the case of precision, the maps with highest accuracy were those where the DEM was used as input data for CAC (Table 4). Nevertheless, the highest accuracy was less than 56 % in all maps generated.
The differences between accuracy and precision of the maps found in this study coincided with that of Lleverino et al. (2000). They found that these parameters could have a variation of more than 20 %. This means that a map can be precise but not have the appropriate accuracy in all cases. For this study, the difference between precision and accuracy was more than 30 %, which is a higher variation than reported by Lleverino et al. (2000) for the three study regions.
The best CAC-based maps are presented in Figure 3. Map 4 of the arid region is considered the best because its precision and accuracy are 27 and 10 % respectively, in comparison with the other eight maps (Figure 3a). The low quality of the map is because the local soil classes Arena, Arena gravosa, Calichuda, Cuerpo and Enlamada were underestimated by more than 20 % and the remaining classes were overestimated by more than as input data. On the other hand, the map quality varies depending on the region: for the arid and temperate regions the precision was less than 45 %, and for the tropical region, the precision was higher than 57 % (Figure 2). The reason could be that the decision tree model for the tropical region had fewer bugs and the setting to identify the local soils classes was better. These results showed that DEM was a good attribute for the delimitation of a map of local soil classes in a tropical region. Moreover, the number of local soil classes identified in the maps of the temperate and arid areas affects map precision, by the inherent complexity of the distribution of local soil classes, reducing the quality of the CAC-based maps. In addition, maps of local soil classes of the arid and temperate region have small soil classes and therefore proportional sampling must be used to obtain more reliable results (Schmidt et al., 2008). The poorest topographic properties for map delimitation were profile curvature, aspect, and slope as input data, individually as well as combined (maps 7, 8, and 9), which indicates that elevation is more closely related with the soil classes than the other three elements. Similar results were found by Ballabio (2009), who assessed 20 maps of prediction variables and derived topographic and geomorphometric data.

Evaluation of accuracy
According to the technique of Burgess & Webster (1984), the highest sample size was from 448 points (map 9) to 2425 points (map 2) for the map of local soil classes of San Luis Potosi, since it has the longest boundaries, followed by the temperate region from 16 points (map 7) to 111 points (map 4) in number of samples, and finally the tropical region from 26 points (map 1) to 58 points (map 8). The tropical region has longer boundaries than the temperate area, but the average distance between boundaries, resulting in a smaller number of samples per ha (Table 3).
The accuracy obtained in this study for the maps of local soil classes was highest for the tropical region followed by the maps of the temperate, and arid Table 4. Binomial test (α α α α α = 0.05) to evaluate accuracy of boundaries Table 3. Sample size per map type and climatic region to calculate accuracy (1) The term "empaty" means that the decision tree classifier was able to identify only one class in the entire area. 30 %. Map 6 of the temperate region had the highest precision and accuracy, with 45 and 14 %, respectively. However, the quality is still considered low because neither attribute exceeds 50 %. The main problems found were the overestimation by more than 60 % of the Arena and Lama soil classes, and the soil classes of Blanca and Cacahuatuda underestimated by less than 16 % (Figure 3b). In the tropical area, map 6 was the best with 56 % precision and 33 % accuracy. The local soil class of Arenal was overestimated by more than 100 %, which could be explained because some Barrial class sites were confused with the Arenal classes, reducing the precision of estimation (Figure 3c).

CONCLUSIONS
1. In general, the DEM was the topographic property that produced the best maps, because the fit of the local soil classes was better with elevation than with slope, profile curvature and aspect.
2. The precision of the best maps ranged from 27 to 56 % and the accuracy of the best maps from 10 to 33 %.
3. The precision and accuracy were highest for maps of the tropical region, while the decision tree models of the arid and temperate region showed misclassifications.
The study showed that the precision of CAC-based maps could be acceptable for some regions whereas the accuracy of the maps still remains a problem.