Digital soilscape mapping of tropical hillslope areas by neural networks

Geomorphometric variables are applied in digital soil mapping because of their strong correlation with the disposition and distribution of pedological components of the landscapes. In this research, the relationship between environmental components of tropical hillslope areas in the Rio de Janeiro State, Brazil, artificial neural networks (ANN), and maximum likelihood algorithm (MaxLike) were evaluated with the aid of geoprocessing techniques. ANN and MaxLike were applied to soilscape mapping and the results were compared to the original map. The ANN architectures with seven and five neurons in the hidden layer produced the best classifications when using samples obtained systematically. When random samples were applied, the best neural net architectures were within 22 and 16 neurons in the hidden layer. In conclusion, the ANN can contribute to soilscape surveys, making map delineation faster and less expensive. The digital elevation model (DEM) and its derived attributes can contribute to the understanding of the soil-landscape relationship of tropical hillslope areas; the use of artificial neural networks and MaxLike is feasible for digital soilscape mapping. The systematic sampling method provided a global accuracy of 70 % and 65.9 % for the ANN and the MaxLike, respectively. When the random sampling method was applied, the ANN had a global accuracy of 69.6 %, and the MaxLike had an accuracy of 62.1 %, considering the total study area in relation to the reference map.


Introduction
The study of the relationship between environmental components can contribute to a better understanding of our environment.Improvements in computational performance, information technology, GIS techniques, and the ever-increasing volume and sophistication of geographical data have led to extraordinary advances in digital soil mapping (Hempel et al., 2006).Digital soil mapping (DSM) has been actively investigated for more than a decade (MacMillan, 2006).However, it has not yet achieved widespread acceptance for operational mapping by mainstream mapping agencies.
In Brazil, apparently DSM is still in its early stage.Soil researchers have just begun to work on the DSM field.In fact, only a few Brazilian published scientific papers in this field were presented in Mendonça-Santos and McBratney (2006), such as the research done by Carvalho Júnior et al. (2006), Giasson et al. (2006) and Valladares and Hott (2006).The use of artificial neural networks (ANN) in DSM provides the means to automatically quantify the spatial distribution of the tropical hillslope soilscapes.Brown et al. (1998) used the ANN to classify glaciated landscapes.Chang and Islam (2000) used the ANN to estimate soil physical properties, and Giles and Franklin (1998) have studied an automatic digital mapping approach to classify hillslope units.Moreover, Behrens et al. (2005) have utilized the ANN to predict soil units.
The main goal of this research is to utilize the ANN to classify soilscapes in hillslope areas (Domain of "Mar de Morros") in the Rio de Janeiro State, identifying the disposition and distribution of the geomorphometrics and pedological components.The research was carried out with the aid of geoprocessing techniques and supervised classification, focusing to some extent on the identification of soilscapes by geomorphometric features.The goal is to compare ANN soilscape classification results with the original map of the area and the map generated by the maximum likelihood estimation classifier (MaxLike).

Materials and Methods
The area under study has 16,470 ha and is located in the north of the northwest region of the Rio de Janeiro State in Brazil ( 47º48' to 47º54' W, 20º48' to 21º00' S).It can be seen in the "Varre Sai" sheet of the topographical chart (IBGE, 1991) and is in the Itabapoana and Muriaé watersheds (Figure 1).The digital cartographic base contains contour lines with 20-m resolution, elevations of some specific points and drainage networks (1:50,000 scale), utilizing the UTM projection system with the "Córrego Alegre" horizontal datum.
The neural network simulator "Neural Java Network Simulator" (JavaNNS, 2001), developed by the Wilhelm-Schickard Institute for Computation Science in Tübingen (Germany), was utilized.This simulator is based on the Neural Stuttgart Networks Simulator 4.2 kernel (SNNS, 1998), with a new graphical interface.tion of thematic images for comparisons.The DEM, from which the majority of ANN input attributes had been generated, has the following characteristics in meters: cell size of 20, maximum value of 1017, minimum of 190, average of 638 and standard deviation of 164, with 901 lines and 457 columns.
Among the geomorphometric parameters utilized in this research, the hydrologic corrected digital elevation model (DEM) is the most important one.This model is generated from contour lines, elevations of surface-specific points and a hydrograph, using Topogridtool (ArcInfo 9) and posterior removal of the spurious depressions, according to Hutchinson (1989) and Hutchinson and Gallant (2000).From the DEM and a map algebra, secondary attributes have been generated, such as: slope (S), aspect (A), curvature (C), plain curvature (PC), profile curvature (PRC), superficial flow accumulation (SFA), superficial flow direction (SFD), relative altimetry of sub-basins (RASB) (Carvalho Júnior et al., 2008) and Euclidean distance of the hydrograph (EDH).These geomorphological parameters compose one of the soil formation factors related to relief conditions (McBratney et al., 2003;Chagas et al., 2010), and are the most used formation factors in soil mapping because of their strong correlation with the spatial variability of soil attributes over landscape (Ceddia et al., 2009).
In this study the following softwares were used: A strategy to stop training the ANNs was assumed to be 20,000 cycles, in which the whole set of training samples is used for every cycle.A strategy was also used to reduce the learning rate (η) with the following training progression in the form: (i) from 0 to 10,000 cycles η = 0.2; (ii) from 10,000 to 15,000 cycles η = 0.1; and (iii) from 15,000 to 20,000 cycles η = 0.075.The ANN architecture is composed of an input layer with a neuron for each input discriminating variable, one hidden layer and an output layer with as many neurons as there are informational classes.The project was developed using the backpropagation learning algorithm, in which all neurons in one layer are fully connected with the next layer of neurons.
To evaluate the classification accuracy, the kappa statistic and its variance were used.These measures were obtained from the confusion or error matrices.All the feature (discriminating variables) values were rescaled to a range between 0 (zero) and 1 (one) due to their magnitude and were stacked in one layer.The samples were acquired in two ways, e. g., systematic and random sampling, by retaining the values of the cells for each attribute and generating the training and validating files in the JavaNNS format.Both systematic and random sampling for training and validation were carried out to evaluate the ANN classification performance.In the first case, the systematic samples were selected using the fieldwork knowledge from the fieldwork itself, and they represent each soilscape shown in Figure 1.The fieldwork sites were selected 'a priori'.On the other hand, the samples randomly selected were collected independently to be representative of each soilscape unit without any constraints.
The validation samples for the ANNs were all obtained randomly without considering their location.Table 1 shows the number of samples acquired for training and validation of the systematic samples and the random samples.The accuracy evaluation was made using the soilscape map as reference data.

Soilscape supervised classification with systematic samples
Classifications were carried out with the artificial neural network (ANN) and MaxLike algorithm.The ANNs were trained with ten neurons in the input layer, each neuron corresponding to one aspect, relative altimetry, curvature, slope, flow direction, flow accumulation, Euclidean distance of the hydrograph, altimetry, profile, and plain curvature.The informational classes are the five defined soilscape units (Table 1).Thus, the ANN output layer had five neurons representing the five informational classes.The amount of training and validation samples collected both randomly and systematically is shown in Table 1.For the systematically collected samples, various architectures were tested in search of one that presented the minimum SSE value (Sum of Squared Errors), and a lesser number of neurons in the hidden layer.Thus, some ANN architectures were trained sequentially, using ten neurons in the input layer and five neurons in the output layer.
After 20,000 cycles of training with the training samples by increasing the number of hidden neurons from one to ten (constructive method), the lower SSE values were obtained for the five neurons with SSE of 9.18 and for the seven neurons with SSE of 4.0.The same samples that were utilized in the ANN were also utilized in MaxLike to test and compare the results.Then the validation file was applied to two ANN architectures and to the MaxLike classifier, and the classification results of both were compared with the reference map in order to generate the ANN and MaxLike confusion matrices and derived accuracy measures (Table 2).In an overview, the ANN with seven neurons presented a global accuracy of 88.3 %, and a Kappa value of 0.851 (Table 2).The Oxisol soilscape (S1) had a better accuracy (99.7 %), followed by Aquent Entisol soilscape (S3, 97.7 %), and by the Ultisol soilscape (S2, 87.1 % accuracy).
The MaxLike classification presented a lower value of global accuracy, equal to 81.9 %.
A significance matrix of the kappa index was elaborated from previous results and presented in Table 3.The values out of the diagonal line (greater than 1.96) indicate significant differences between the classifiers.Thus, there is no significant difference between the ANN5 and ANN7, and both presented better and significantly different results from the MaxLike classification.The S1 and S3 classes had better accuracy results considering the ANN classifications (Table 3).On the other hand, the S2 and S3 classes had better classification in the MaxLike algorithm.Therefore, the ANN5 and ANN7 were utilized to classify the study area, and MaxLike was used just for comparison.Global accuracy = 81.9Kappa = 0.767 Where: S1 = oxisols, S2 = ultisols, S3 = Aquent Entsoils, S4 = Inceptisols and S5 = rocks Outcrops.
Carvalho Junior et al.

Soilscape supervised classification with random samples
Adopting the same procedure employed in the ANN with systematic samples, different architectures were tested and those with 22, 16, 17, 11, and 15 neurons in the hidden layer were selected.The significance of the kappa statistic matrix for evaluating each classification (Table 4) showed that there was a difference between the ANN14 results, the ANN16 and the ANN22.There was no difference between the other ANNs.By comparing the classification results of the MaxLike and all other ANNs, the MaxLike classifier was different from the others, indicating a worse performance.Thus, it was defined that the ANN22, ANN16, ANN17, ANN11 and ANN15 was used in the soilscape classification of the area.Even though MaxLike presented a bad result, it was used for comparison.

Study area classifications
To evaluate ANN and MaxLike classifications for the area, the confusion matrix results, global accuracy, and Kappa index were calculated and are presented in Table 5.The value of global accuracy shows that the best results were obtained by utilizing both ANN5 and ANN16, with global accuracy of 70 % and 69.6 %, respectively.The MaxLike had a lower value of global accuracy (65.9 %).The kappa coefficient of these three classifiers, ANN5, ANN16, and MaxLike, were 0.55, 0.55, and 0.49, respectively.
Figure 2 shows the ANN5 classification.It can be used in a visual comparison with Figure 1, which presents the reference map and on which a discussion of inclusion and omission errors is based.Considering the visual aspect of all classification results, it seems to have consistent forms.For example, the Aquent Entisol soilscape follows the drainage network.Moreover, little groups of cells spread out inside soilscapes were delineated, indicating that the smaller ANN architectures generalize the better soilscape classes.As shown in the ANN16 classification, the soilscape classes seem to be relatively more spread out throughout the study area, and even more so when the MaxLike classification is analyzed.In this visual evaluation, a good delineation of limits between the Ultisols (S2) and Oxisols (S1) is observed.
As an evaluation of the classifications, two Oxisol profiles described in the reference soil survey map of the Rio de Janeiro State (Carvalho Filho et al., 2003) were found and compared with the three maps generated by ANN5, ANN22, and Max-Like.It was observed that the two Oxisol profiles were both well delineated by the three classifiers according to their geographic positions.The values of the kappa coefficient, according to a proposal of Landis and Koch (1977), indicated good quality.Regarding the global accuracy, all were higher than 65.9 %, which is considered a satisfactory result.The ANN5 and MaxLike classifications fit to the field knowledge, indicating that the curvature is an important attribute in the classification, even more so when associated with other attributes, such as Euclidean distance, relative altimetry, slope, and the altimetry (Figure 3).

Conclusions
The attributes derived from DEM can contribute to the understanding of the tropical hillslope area soil-landscape relationship, separately or in groups.The use of classifiers like ANNs and MaxLike is feasible for digital soilscape classification, assisting with map delineation, and the terrain attributes discriminate the considered soilscapes.The classifications of the area by ANN classifiers were all better than the ones obtained from MaxLike, and the ANN simulator Java NNS was suggested because of its friendly interface and free software.The backpropagation method gives an acceptable classification error after training and learning processes, but the real global accuracy depends on obtaining ground control points during new field trips.
The ANN classification procedure includes: (i) definition of input parameters and output soilscapes; (ii) collection of training and validation samples; (iii) generation of training files and validation files in the JavaNNS format; (iv) training of the ANN architectures; definition of the best ANN architectures; (v) validation of the trained ANNs; (vi) generation of the confusion matrix; (vii) comparison with the MaxLike results; (viii) application of the trained ANNs and MaxLike into the studied area; and (ix) genera- (i) ARC/INFO version 8.2 -ESRI -to produce the DEM and other attributes; (ii) ArcView GIS version 3.2a -ESRI -to view and computed results; (iii) JavaNNS -Java Neural Network Simulator -Version 1.1 -University of Tübingen.;(iv) Executables funcpow, gerapat and Max_like_cof -(provided by Professor Carlos A. O. Vieira -Civil Engineering Department -Viçosa Federal University) -for data management; (v) ERDAS IMAGINE version 8.5 -ERDAS Systems -to collect data samples; and (vi) Microsoft Excel -2000 -Microsoft Corporation -for data management.

Figure 1 -
Figure 1 -Area location and the defi ned soilscape units in the Rio de Janeiro State, Brazil.

Figure 2 -
Figure 2 -Classifi cation of the area by the ANN5.

Figure 3 -
Figure 3 -Comparison between the reference map and the ANN5 and MaxLike classifi cation.

Table 1 -
Numbers of training and validation samples for fi ve soilscape units.

Table 2 -
Confusion matrix, statistical Kappa coeffi cient and global accuracy from the validation sample classifi cations to the ANNs and MaxLike.

Table 3 -
Signifi cance Matrix of Kappa index to the ANN and MaxLike tests.

Table 4 -
Signifi cance matrix of the kappa statistic to the validation samples.

Table 5 -
Global accuracy and kappa coefficient of classifi cation results of the area.