Digital soil mapping using multiple logistic regression on terrain parameters in southern Brazil

Giasson, Elvio; Clarke, Robin Thomas; Inda Junior, Alberto Vasconcellos; Merten, Gustavo Henrique; Tornquist, Carlos Gustavo

doi:10.1590/S0103-90162006000300008

Abstracts

Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory terrain variables and soil types, using either the original legend or a simplified legend, and using or not stratification of the study area by drainage classes. Terrain parameters, such as elevation, distance to stream, flow accumulation, and topographic wetness index, were the variables that best explained soil distribution. Stratification by drainage classes did not have significant effect. Simplification of the original legend increased the accuracy of the method on predicting soil distribution.

GIS; DEM; soil survey; terrain analysis

Os levantamentos de solos são fontes de informação necessárias para o planejamento de uso das terras, entretanto eles nem sempre estão disponíveis. Este estudo propõe o uso de regressões logísticas múltiplas na predição de ocorrência de classes de solos a partir de áreas de referência. Baseado no mapa original de solos em formato digital e parâmetros do terreno derivados do modelo numérico do terreno em ambiente ArcView, vários conjuntos de regressões logísticas múltiplas foram definidas usando o programa estatístico Minitab, estabelecendo relações entre as variáveis do terreno independentes e tipos de solos, usando tanto a legenda original como uma legenda simplificada, e usando ou não estratificação da área de estudo por classes de drenagem. Os parâmetros do terreno como elevação, distância dos rios, acúmulo de fluxo e índice de umidade topográfica foram as variáveis que melhor explicaram a distribuição das classes de solos. A estratificação por classes de drenagem não teve efeito significativo. A simplificação da legenda aumentou a precisão do método na predição da distribuição dos solos.

levantamento de solos; SIG; MNT; análise do terreno

SOILS AND PLANT NUTRITION

Digital soil mapping using multiple logistic regression on terrain parameters in southern Brazil

Mapeamento digital de solos utilizando regressões logísticas múltiplas e parâmetros do terreno no sul do Brasil

Elvio Giasson^I,^* * Corresponding author < giasson@ufrgs.br> ; Robin Thomas Clarke^II; Alberto Vasconcellos Inda Junior^I; Gustavo Henrique Merten^II; Carlos Gustavo Tornquist^III

^IUFRGS - Faculdade de Agronomia - Depto. de Solos, C.P. 15100 - 90001-970 - Porto Alegre, RS - Brasil

^IIUFRGS - Instituto de Pesquisas Hidráulicas, C.P. 15029 - 91501-970 - Porto Alegre, RS - Brasil

^IIIUFRGS - Programa de Pós-Graduação em Ciência do Solo

ABSTRACT

Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory terrain variables and soil types, using either the original legend or a simplified legend, and using or not stratification of the study area by drainage classes. Terrain parameters, such as elevation, distance to stream, flow accumulation, and topographic wetness index, were the variables that best explained soil distribution. Stratification by drainage classes did not have significant effect. Simplification of the original legend increased the accuracy of the method on predicting soil distribution.

Key words: GIS, DEM, soil survey, terrain analysis

RESUMO

Os levantamentos de solos são fontes de informação necessárias para o planejamento de uso das terras, entretanto eles nem sempre estão disponíveis. Este estudo propõe o uso de regressões logísticas múltiplas na predição de ocorrência de classes de solos a partir de áreas de referência. Baseado no mapa original de solos em formato digital e parâmetros do terreno derivados do modelo numérico do terreno em ambiente ArcView, vários conjuntos de regressões logísticas múltiplas foram definidas usando o programa estatístico Minitab, estabelecendo relações entre as variáveis do terreno independentes e tipos de solos, usando tanto a legenda original como uma legenda simplificada, e usando ou não estratificação da área de estudo por classes de drenagem. Os parâmetros do terreno como elevação, distância dos rios, acúmulo de fluxo e índice de umidade topográfica foram as variáveis que melhor explicaram a distribuição das classes de solos. A estratificação por classes de drenagem não teve efeito significativo. A simplificação da legenda aumentou a precisão do método na predição da distribuição dos solos.

Palavras-chave: levantamento de solos, SIG, MNT, análise do terreno

INTRODUCTION

Soil surveys are recognized as important sources of information for land use planning and management. In Brazil, soil surveys for most of the country are available only at small scale (1:750,000), and just a small portion of the Brazilian territory has semi-detailed or detailed soil surveys because of funding limitations.

A more cost-effective approach to traditional, large scale soil surveys would be to map soils of representative areas within homogeneous regions and use the soil-landscape relationships to predict soil distribution on non-surveyed areas (Schneider & Klamt, 1996). This approach is similar to the reference area method (Lagacherie et al., 2001), which is based on the hypothesis that it is possible to sample a reference area including most of the soil classes of a region. Based on this area, the prediction of soil distribution on other areas may be facilitated if the landscape is modeled by digital terrain analysis (Hengl & Rossiter, 2003), and if relationships between soils and landscape are modeled.

Multiple logistic regression have been used successfully in soil science and related fields, as to predict landslide hazard (Ohlmacher & Davis, 2003) or probability of occurrence of soil drainage classes (Campling et al., 2002; Kravchenko et al., 2002), or to relate the presence of a non-calcareous clay-loam horizon to terrain attributes (King et al., 1999). As soil map units are categorical variables, multiple logistic regression may be suitable for predicting occurrence of soil classes from landscape variables, with the advantage of allow to associate the prediction with probabilities of occurrence of soil mapping units.

Although previous works used logistic regression to estimate the occurrence of specific soil characteristics instead of soil taxonomic classes or mapping units, they suggest that logistic regressions may have potential for producing soil maps from terrain parameters based on relationships between these parameters and soil occurrence. This study evaluates a new method of extrapolation on landscape parameters by testing how well multiple logistic regressions can reproduce a soil map of a reference area.

MATERIAL AND METHODS

The study area was the Sentinela do Sul county, located in southeastern Rio Grande do Sul, Brazil (UTM 22S, Easting from 434,000 m to 453,000 m, and Northing from 6,596,000 m to 6,626,000 m), comprising an area of 253 km² with minimum elevation at sea level and maximum elevation of 525 m. The regional climate is subtropical (Köppen classification: Cfa II 1 d), with mean annual temperature 17.6ºC and mean annual precipitation 1600 mm.

The relief is nearly flat in the alluvial plains, gently sloping or strongly sloping in the colluvial deposits, and very steep in the highlands, the landform that dominates the landscape. There are two different types of parent material: a granite-gneiss mixture with associated migmatite, and Cenozoic sediments, including sediments of gravitational and colluvial sources (Silveira, 1984). Recent and old fluvial deposits complete the transition of sedimentation between the complexes of granites and the coastal sedimentation on the eastern side of the county, making the parent material distribution complex.

A detailed reconnaissance soil survey of the county at scale 1:50,000 (Klamt et al., 1996), produced according to usual soil survey procedures that included extensive field work and photointerpretation, established eight mapping units (MU1, MU2, ... and MU8) (Table 1). Most MUs are combinations (undifferentiated group, complex, or association) of two or more of the nine different soil taxonomic units found in the area because of the difficulty in separate individual soil classes as consociations (i.e., mapping units with only one dominant soil class) given the complexity of soil-landscape relationships, where each distinguishable landform usually had several classes of soils (Klamt et al., 1996).

Thumbnail

The digital elevation model (DEM) used had a resolution of 3 arc sec, corresponding to a pixel size of approximately 92 m, and was obtained from USGS SDTS - SRTM (Shuttle Radar Topography Mission) (Rabus et al., 2003). The DEM was used, directly or as a component, to calculate other nine soil prediction variables for each pixel: slope gradient, profile curvature, planar curvature, curvature (combination of planar and profile curvature), flow direction, flow accumulation, flow length, stream power index (SPI), and topographic wetness index (TWI) (Wolock & McCabe, 1995). Each of these landform parameters was selected to be tested as explanatory variable because they were expected to represent changes on soil-forming factors and, therefore, are believed to be informative on the occurrence of soil mapping units.

For the establishment of relationships between these explanatory variables and soil distribution, logistic regressions were used because they allow to predict probabilities of occurrence of soil mapping units based on terrain variables, an advantage when compared to other prediction techniques. The multiple logistic regression model is a non-linear transformation of the linear regression, which allows to estimate the probability of occurrence of any number of classes of a dependent variable (in this case, soil mapping units) based on explanatory variables (Hosmer & Lemeshow, 1989).

Data sampling for training points consisted of 7,500 random map observations (one observation per each 3.5 ha) consisting of DEM, DEM derived parameters, and soil classes, as classified in the soil survey. The option for not using the entire area as training points intended to allow to test the results of the prediction in a data set different that the training set of points, which in this case was defined as the entire study area. Hengl & Rossiter (2003) used operator-selected representative sampling points, the option for using random points intended to eliminate subjectivity and to allow simple reproducibility. The data sampled in ArcView 3.2 environment (ESRI, 1999) was exported as tables and analyzed statistically using software Minitab version 11 (Minitab Inc., 1996).

Best sets of logistic regressions explaining the soil distribution were selected based upon two prediction methods: 1) stratified prediction, by modeling the entire study area without stratification of areas by drainage classes; and 2) unstratified prediction, by modeling the entire study area by stratification of areas by drainage classes. This separation in drainage classes was based on TWI values and used a threshold value of TWI = 7.7 (well drained < 7.7 < poorly drained), after applying a 3 pixel by 3 pixel mean filter to the computed TWI map. The threshold value was determined by testing filtered TWI classification looking for classes that would better reproduce drainage classes groups as represented on the original soil map.

Parameters for multiple nominal logistic regressions were calculated in Minitab environment using soil mapping unit as response or dependent variable, classified in eighth nominal classes (UM1 to UM8). A step-by-step procedure was used to obtain the best fit set of logistic regressions, starting with a larger number of variables and excluding the variables considered less related to variations on the response variable. For each prediction method, a set of best fit regressions was selected based on criteria as goodness to fit tests (Pearson and deviance), log-likelihood, odds ratio, and Z test (Hosmer & Lemeshow, 1989). A set of logistic regressions was determined for the unstratified prediction method, and two sets of logistic regressions were selected for the stratified prediction method, respectively for well drained areas and for poorly drained areas. These sets of logistic regressions were used to estimate the probabilities of occurrence of each soil mapping unit. These sets of equations were organized as scripts in ArcView 3.2 environment, assigning a probability value to each pixel for the entire study area and creating eight maps, each of them representing for the entire study area the probability of occurrence of each soil mapping unit. Using the function Map Query of ArcView Spatial Analyst Extension on the probability maps, two soil maps were estimated (one for stratified and one for not stratified procedures) by assigning to every single pixel the denomination of the soil mapping unit that had the larger probability of occurrence on that pixel.

The accuracy of the estimated soil maps was determined by using error matrices (Congalton, 1991) comparing all pixels of the estimated maps to the original soil map, by this way using a larger dataset than in the training points dataset, and intending to evaluate what would be the effects of using logistic regressions for extrapolating prediction of soil occurrence to other areas than only the training points. The four map accuracy indicators were used: 1) overall accuracy, calculated by dividing the number of correctly-classified pixels by the total number of pixels; 2) producer accuracy, which measures how well an area has been classified, or what the proportion of each mapping unit was mapped according to the original soil map; 3) user accuracy, which measures the reliability of the map, or how well the estimated map is reproducing the original map in a specific point; and 4) Cohen's Kappa statistic (Cohen, 1960), which corrects for agreements that would happen by chance between the original soil map (with and without legend simplification) and the estimated soil maps.

RESULTS AND DISCUSSION

The unstratified prediction of soil mapping units with multiple logistic regressions selected variables elevation, distance to streams, and TWI to explain the occurrence of soil mapping units; these variables related to water accumulation and water table depth. For the stratified prediction, variables selected to explain the occurrence of soil types in well drained areas were elevation, distance to streams, and slope, showing that soil occurrence in these areas is more related to factors that affect water movement and soil erosion processes. For poorly drained areas the selected variables were elevation, distance to streams, and flow accumulation, indicating that soil distribution in these areas is more related to factors that affect soil drainage and water table depth (Table 2). These sets of best predictors are related to long-term known relationships between soil forming factors, landform, and soil distribution, which associates soil distribution to erosion processes in steep, well drained areas, and with water dynamics, as water table depth (variable elevation) and water movement and accumulation, in poorly drained areas.

Thumbnail

Table 2 presents three sets of equations, respectively for unstratified area and for area stratified by drainage classes. Each set of equations allows calculate the probability of occurrence of soil mapping units. For example, from the first column of Table 2 we may extract the following equation: log [p1/p8] = 10.073 - (0.9359 * TWI) - (0.030455 * elevation) + (0.0044564 * distance to streams).

For evaluating the reproducibility of the original soil map, the concordance between the original soil map and the two newly generated maps (stratified and unstratified by drainage classes) (Figure 1) was evaluated using error matrices (Table 3). Considering the major classification errors, without stratification MU1, MU3, MU5, and MU6 were underclassified, while MU4 and MU7 were overclassified. When the study area was stratified, MU2, MU4, MU7, MU7, and MU8 were overclassified and MU3 and MU6 were underclassified.

Thumbnail

Attempts to classify the entire landscape never achieved better results than a 48% overall accuracy (Kappa = 36%), obtained without drainage class separation, while the map estimated with separation of drainage classes had an overall accuracy of 45% (Kappa = 31%). Overall accuracy and Kappa indices were considered unsatisfactory in both cases, although they are in the same magnitude that values found by Hengl & Rossiter (2003). Best user accuracy was obtained for MU3 (61.5%), MU5 (54.5%), and MU7 (51.0%) when drainage classes were not separated, and MU8 (86.9%), MU4 (85.4%), and MU6 (67.7%), when drainage classes were separated (Table 3).

For most soil mapping units, higher producer and user accuracy was achieved when the study area was not stratified. Although it would be expected higher accuracy when separating areas by drainage classes, this was not observed. Given the characteristics of the SRTM DEM, its precision may not have reproduced small variations of elevation, and its resolution may have not showed the actual terrain variations at short distance. Significant differences with stratification of a study area in hill land and plains were obtained by Hengl & Rossiter (2003), who affirm that although stratification has advantages, "it would be more practical to develop a single data set and predictive map of the entire area at once".

To improve the capacity to reproduce the original soil map, the legend was simplified by grouping similar mapping units based on their higher taxonomic categorical level, aiming to verify if a simplified categorical legend, which would usually correspond to a cartographic scale reduction, could be better predicted. The simplified legend joined mapping units MU2, MU3, and MU4 (reclassified as MU24) and mapping units MU6, MU7, and MU8 (reclassified as MU68). Thus, the simplified legend was formed by mapping units MU1, MU24, MU5, and MU68. Same procedures with and without stratification of the area by drainage classes were used for estimating multiple logistic regressions and evaluating map accuracy.

For the maps estimated using the simplified legend (Figure 2), the procedure without stratification by drainage classes had an overall accuracy of 71% (Kappa = 54%), which is an increase of 48% in relation to the same procedure without stratification using the original legend. The map estimated with stratification of the area by drainage classes had overall accuracy of 68% (Kappa = 51%), an increase of 51% in comparison to when the original legend was used. Overall percent correct and Kappa were similar for situations with or without stratification of the area by drainage classes. Higher user accuracy was obtained with stratification by drainage classes in most of the mapping units, with values of 83.7% for MU24 and 70.5% for MU68. Higher producer accuracies were obtained for the procedure without stratification by drainage classes for soil mapping units MU24 (86.6%) (Table 4), and major errors were underclassification of MU1 and overclassification of MU68 when not stratifying by drainage classes. When stratification was used, major errors were overestimation of MU1 and underestimation of occurrence of MU68.

Thumbnail

Although the use of a simplified legend makes the soil map lose precision (more soil classes included in a map unit), it makes the soil map gain accuracy, i.e., the capacity to reproduce a reference soil map, either using an original field surveyed or a map with simplified legend.

Received August 30, 2005

Accepted March 20, 2006

CAMPLING, P.; GOBIN, A.; FEYEN, J. Logistic modeling to spatially predict the probability of soil drainage classes, Soil Science Society of America Journal, v.66, p.1390-1401, 2002.
COHEN, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, v.20, p.37-46, 1960.
CONGALTON, R.G. A review of assessing the accuracy of classification of remotely sensed data. Remote Sensing of Environment, v.37, p.35-46, 1991.
ENVIRONMENTAL SYSTEMS RESEARCH INSTITUTE - ESRI. ArcView 3.2. Redland, California, 1999.
HENGL, T.; ROSSITER, D.G. Supervised landform classification to enhance and replace photo-interpretation in semi-detailed soil survey. Soil Science Society of America Journal, v.67, p.1810-1822, 2003.
HOSMER, D.W.; LEMESHOW, S. Applied logistic regression New York: John Wiley & Sons, 1989. 307p.
KLAMT, E.; SCHNEIDER, P.; KÄMPF, N.; GIASSON, E.; BASTOS, C.A.B.; LIMA, V.S.; GREHS, S.A.; HESSELN, N.E. Levantamento de reconhecimento de alta intensidade dos solos do município de Sentinela do Sul, RS Porto Alegre: Departamento de Solos, UFRGS, 1996. (Relatório Técnico).
KING, D.; BOURENNANE, H.; ISAMBERT, M.; MACAIRE, J. Relationship of the presence of a non-calcareous clay-loam horizon to DEM attributes in a gently sloping area. Geoderma, v.89, p.95-111, 1999.
KRAVCHENKO, A.N.; BOLLERO, G.A.; OMONODE, R.A.; BULLOCK, D. Quantitative mapping of soil drainage classes using topographical data and soil electrical conductivity. Soil Science Society of America, v.66, p.235-243, 2002.
LAGACHERIE, P.; ROBBEZ-MASSON, J.; NGUYEN-THE, N.; BARTHS, J. Mapping of reference are a representativity using a mathematical soilscape distance. Geoderma, v.101, p.105-118, 2001.
MINITAB INC. Minitab User's Guide Release 11 for Windows. State College: Minitab Inc., 1996.
OHLMACHER, G.C.; DAVIS, J.C. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Engineering Geology, v.69, p.331-343, 2003.
RABUS, B.; EINEDER, M.; ROTH, A.; BAMLER, R. The shuttle radar topography mission - a new class of digital elevation models acquired by spaceborne radar. ISPRS. Journal of Photogrammetric and Remote Sensing, v.57, p.241-262, 2003.
SCHNEIDER, P.; KLAMT, E. Necessidades e perspectivas em levantamentos de solos no Rio Grande do Sul.. In: SIMPÓSIO BRASILEIRO SOBRE ENSINO DE SOLOS, 2., Santa Maria, 1995. Proceedings Santa Maria: SBCS, UFSM, 1996.
SILVEIRA, R.J.C. da. Variabilidade das características dos solos e relações solo-superfícies geomórficas na Encosta do Sudeste do Rio Grande do Sul. 139f. Porto Alegre:UFRGS, 1984. 139p. (Dissertation - M.Sc.).
WOLOCK, D.M.; McCABE, G.J. Comparison of single and multiple flow-direction algorithms for computing topographic parameters in TOPMODEL: Water Resources Research, v.31, p.1315-1324, 1995.

*

Corresponding author <

giasson@ufrgs.br>

Publication Dates

Publication in this collection
26 June 2006
Date of issue
June 2006

History

Received
30 Aug 2005
Accepted
20 Mar 2006

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] CAMPLING, P.; GOBIN, A.; FEYEN, J. Logistic modeling to spatially predict the probability of soil drainage classes, Soil Science Society of America Journal, v.66, p.1390-1401, 2002.

[2] COHEN, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, v.20, p.37-46, 1960.

[3] CONGALTON, R.G. A review of assessing the accuracy of classification of remotely sensed data. Remote Sensing of Environment, v.37, p.35-46, 1991.

[4] ENVIRONMENTAL SYSTEMS RESEARCH INSTITUTE - ESRI. ArcView 3.2. Redland, California, 1999.

[5] HENGL, T.; ROSSITER, D.G. Supervised landform classification to enhance and replace photo-interpretation in semi-detailed soil survey. Soil Science Society of America Journal, v.67, p.1810-1822, 2003.

[6] HOSMER, D.W.; LEMESHOW, S. Applied logistic regression New York: John Wiley & Sons, 1989. 307p.

[7] KLAMT, E.; SCHNEIDER, P.; KÄMPF, N.; GIASSON, E.; BASTOS, C.A.B.; LIMA, V.S.; GREHS, S.A.; HESSELN, N.E. Levantamento de reconhecimento de alta intensidade dos solos do município de Sentinela do Sul, RS Porto Alegre: Departamento de Solos, UFRGS, 1996. (Relatório Técnico).

[8] KING, D.; BOURENNANE, H.; ISAMBERT, M.; MACAIRE, J. Relationship of the presence of a non-calcareous clay-loam horizon to DEM attributes in a gently sloping area. Geoderma, v.89, p.95-111, 1999.

[9] KRAVCHENKO, A.N.; BOLLERO, G.A.; OMONODE, R.A.; BULLOCK, D. Quantitative mapping of soil drainage classes using topographical data and soil electrical conductivity. Soil Science Society of America, v.66, p.235-243, 2002.

[10] LAGACHERIE, P.; ROBBEZ-MASSON, J.; NGUYEN-THE, N.; BARTHS, J. Mapping of reference are a representativity using a mathematical soilscape distance. Geoderma, v.101, p.105-118, 2001.

[11] MINITAB INC. Minitab User's Guide Release 11 for Windows. State College: Minitab Inc., 1996.

[12] OHLMACHER, G.C.; DAVIS, J.C. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Engineering Geology, v.69, p.331-343, 2003.

[13] RABUS, B.; EINEDER, M.; ROTH, A.; BAMLER, R. The shuttle radar topography mission - a new class of digital elevation models acquired by spaceborne radar. ISPRS. Journal of Photogrammetric and Remote Sensing, v.57, p.241-262, 2003.

[14] SCHNEIDER, P.; KLAMT, E. Necessidades e perspectivas em levantamentos de solos no Rio Grande do Sul.. In: SIMPÓSIO BRASILEIRO SOBRE ENSINO DE SOLOS, 2., Santa Maria, 1995. Proceedings Santa Maria: SBCS, UFSM, 1996.

[15] SILVEIRA, R.J.C. da. Variabilidade das características dos solos e relações solo-superfícies geomórficas na Encosta do Sudeste do Rio Grande do Sul. 139f. Porto Alegre:UFRGS, 1984. 139p. (Dissertation - M.Sc.).

[16] WOLOCK, D.M.; McCABE, G.J. Comparison of single and multiple flow-direction algorithms for computing topographic parameters in TOPMODEL: Water Resources Research, v.31, p.1315-1324, 1995.