SciELO - Scientific Electronic Library Online

vol.76 issue5Histopathological characterization of Coffea arabica cultivar IPR 106 resistance to Meloidogyne paranaensis author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Scientia Agricola

On-line version ISSN 1678-992X


VALADARES, Alan Pessoa; COELHO, Ricardo Marques  and  OLIVEIRA, Stanley Robson de Medeiros. Preprocessing procedures and supervised classification applied to a database of systematic soil survey. Sci. agric. (Piracicaba, Braz.) [online]. 2019, vol.76, n.5, pp.439-447.  Epub May 20, 2019. ISSN 1678-992X.

Data Mining techniques play an important role in the prediction of soil spatial distribution in systematic soil surveying, though existing methodologies still lack standardization and a full understanding of their capabilities. The aim of this work was to evaluate the performance of preprocessing procedures and supervised classification approaches for predicting map units from 1:100,000-scale conventional semi-detailed soil surveys. Sheets of the Brazilian National Cartographic System on the 1:50,000 scale, “Dois Córregos” (“Brotas” 1:100,000-scale sheet), “São Pedro” and “Laras” (“Piracicaba” 1:100,000-scale sheet) were used for developing models. Soil map information and predictive environmental covariates for the dataset were obtained from the semi-detailed soil survey of the state of São Paulo, from the Brazilian Institute of Geography and Statistics (IBGE) 1:50,000-scale topographic sheets and from the 1:750,000-scale geological map of the state of São Paulo. The target variable was a soil map unit of four types: local “soil unit” name and soil class at three hierarchical levels of the Brazilian System of Soil Classification (SiBCS). Different data preprocessing treatments and four algorithms all having different approaches were also tested. Results showed that composite soil map units were not adequate for the machine learning process. Class balance did not contribute to improving the performance of classifiers. Accuracy values of 78 % and a Kappa index of 0.67 were obtained after preprocessing procedures with Random Forest, the algorithm that performed best. Information from conventional map units of semi-detailed (4th order) 1:100,000 soil survey generated models with values for accuracy, precision, sensitivity, specificity and Kappa indexes that support their use in programs for systematic soil surveying.

Keywords : machine learning algorithms; random forest; tacit soil-landscape relationships; digital soil mapping.

        · text in English     · English ( pdf )