Digital Soil Mapping Using Machine Learning Algorithms in a Tropical Mountainous Area

Increasingly, applications of machine learning techniques for digital soil mapping (DSM) are being used for different soil mapping purposes. Considering the variety of models available, it is important to know their performance in relation to soil data and environmental variables involved in soil mapping. This paper investigated the performance of eight machine learning algorithms for soil mapping in a tropical mountainous area of an official rural settlement in the Zona da Mata region in Brazil. Morphometric maps generated from a digital elevation model, together with Landsat-8 satellite imagery, and climatic maps, were among the set of covariates to be selected by the Recursive Feature Elimination algorithm to predict soil types using machine learning algorithms. Mapping performance was assessed using the confusion matrix, and the Z-test among the Kappa indexes of the matrices. In a conventional soil survey, the soils described and classified in the Brazilian System of Soil Classification [Argissolos Vermelho-Amarelos Distróficos – PVAd (Acrisols), Cambissolos Háplicos Tb Distróficos CXbd (Cambisols), Gleissolos Háplicos Háplicos Tb Distróficos GXbd (Gleysols), Latossolos Amarelos Distróficos LAd (Xanthic Ferralsos), Latossolos Vermelho-Amarelos Distróficos LVAd (Rhodic Ferralsols), and Neossolos Litólicos Distróficos RLd (Neossols)] were grouped into composite mapping units (MU) using the conventional method. The eight algorithms showed similar performance without statistical difference (Kappa 0.42-0.48). The mapping of soils with varying slopes (LAd, LVAd, CXbd) showed lower accuracy, whereas soils on hydromorphic lowlands (GXbd) were classified more accurately. In map algebra, the result was rather satisfactory, with 63-67 % agreement between the conventional soil map and maps produced by machine learning. The areas with the largest disagreement in the DSM occurred in the LAd unit due to subtle color variation in the Latossolos mantle without a clear relation to any environmental variable, highlighting difficulties in DSM regarding hill slope landforms. Model performance was satisfactory, and good agreement with the conventional soil map demonstrates the importance of the DSM as a potential complementary tool for assisting soil mapping in mountainous areas in Brazil for the purpose of land use planning.


INTRODUCTION
Soil mapping is key for guiding decision makers in natural resource assessments, environmental modelling, and land use studies.The soil mapping process requires the knowledge and experience of a senior pedologist for the stages of soil surveying, soil classification, and soil mapping (Kempen et al., 2012;Resende et al., 2014).For mapping soils, the pedologist makes use of environmental information, such as climate, lithology, vegetation, and landforms; analyses and interpretations of soil samples collected in the area of interest; and consideration of all elements involved in soil-landscape dynamics (Nolasco-Carvalho et al., 2009;Brady and Weil, 2013;Schaefer, 2013).Through a systematic analysis of field observation and supporting data during the mapping, the pedologist identifies the relationships between soils and landscape features and establishes the soil mapping unit boundaries to be drawn on the map (Nolasco-Carvalho et al., 2009;Brevik et al., 2016).
The Digital Soil Mapping (DSM) technique has shown great potential in producing spatial information (McBratney et al., 2003).Basically, the DSM allows maps of soil spatial distribution to be created by means of numerical models that take soil environmental covariates into account, allowing inferences to be made of spatial and temporal variations of soil types and their properties (Lagacherie and McBratney, 2007).Digital Soil Mapping studies make use of spatially dense maps of covariates, related to soil formation and processes, that are usually available.
In a recent review on the use of DSM for soil mapping in Brazil (ten Caten et al., 2012), the approaches used up to 2011 show three main classification models applied for DSM (ANNs, logistic regression, and decision tree).Up to then, studies had not used open source software, such as SAGA and R. In the following years, due to the development and free sharing of codes for soil modeling on open source platforms, several models run in R software became popular in research dealing with soil mapping.After 2011, Logistic regression continued to feature among the most used models for mapping soil classes in Brazil (Souza, 2013;Vasques et al., 2015;Chagas et al., 2017;Jeune et al., 2018), along with the following: ANNs (Arruda et al., 2016), maximum likelihood (Demattê et al., 2016), fuzzy logic and expert knowledge (Silva et al., 2014), and tree-based models (Bazaglia Filho et al., 2013;Höfig et al., 2014;Rizzo et al., 2016).
Studies aiming to compare machine learning algorithms have recently increased (Collard et al., 2014;Brungard et al., 2015;Taghizadeh-Mehrjardi et al., 2015;Heung et al., 2016;Chagas et al., 2017;Mosleh et al., 2017;Zeraatpisheh et al., 2017;Jeune et al., 2018), enabling comparison and identification of algorithms with the best performance for specific soil-landscape conditions.Model performance has been reported as showing differences due to classifier parameters and configurations (Brungard et al., 2015;Heung et al., 2016;Chagas et al., 2017;Jeune et al., 2018), as well as the sampling density within soil classes (Collard et al., 2014;Brungard et al., 2015;Zeraatpisheh et al., 2017) and the taxonomic level of soil classification (Taghizadeh-Mehrjardi et al., 2015;Mosleh et al., 2017;Zeraatpisheh et al., 2017).In addition to these factors influencing model performance, contrasting results in creating spatial soil information may be intrinsic to soil-landscape characteristics that directly relate to available datasets and models used for mapping, and studies have not yet fully covered the diversity of soils across Brazilian territory.
The aim of this paper was to investigate the performance of machine learning algorithms on soil mapping in a mountainous area of the Zona da Mata region, state of Minas Gerais, Brazil.The main objectives were: 1) to perform soil mapping on a 1:50,000 scale using a conventional method to serve as a reference map for comparison with soil maps of DSM models; 2) to select the most relevant set of environmental covariates to generate soil class maps using machine learning algorithms; and 3) to compare machine learning algorithms for soil mapping in a hilly, dissected tropical landscape.

Study area
The present study was carried out in the area of the Dênis Gonçalves Settlement of Agrarian Reform (21° 34' 30" S and 43° 12' 33" W, and 21° 39' 36" S and 43° 09' 15" W), with a total area of 4,213.60 hectares.The area occupies lands in the municipalities of Goianá, Chácara, Coronel Pacheco, and São João Nepomuceno, located in the mesoregion of the Zona da Mata of the Minas Gerais State, Brazil (Figure 1) from 409 to 928 m altitude.
The climate in the region is classified as Cwb (Köppen classification system), subtropical, and mesothermic, with mild summers, and rainfall concentrated from October to April, and cold and dry winters (May to September) (Alvares et al., 2013).The mean annual temperature is 18.7 °C, and mean annual rainfall is 1,528 mm.It is located in the Atlantic Forest biome, formerly a semi-deciduous forest area.Two main rivers cross through the settlement: the Cágado River, in the southern part of the border The settlement is split along the middle by a group of mountains named "Serra da Babilônia", which forms three distinct environments: the first one, the "Serra", with rocky outcrops steep in the east-west direction.This is a typical structural environment, favoring the emergence of a truncated hydrography in the southern part.Towards the north, the landscape is dominated by low hills (morrotes) with gently rolling to rolling slopes and elongated valley bottoms, dominated by meandering streams that create hydromorphic environments dominated by Gleysols.The third environment is located to the south of the mountain range, with predominance of high, mountainous topography and flat valley bottoms, embedded, and elongated.These landscapes are characteristic of the Zona da Mata region, which usually has low fertility soils covered with pastures, coffee fields, and forests (Resende et al., 2014) and highly dissected landscapes of the "mares de morros" (sea-of-hills) (Ab'Sáber, 1975;Silva and Mello, 2011).In this area, rivers generally have terraces and alluvial plains (Silva and Mello, 2011;Schaefer, 2013;Resende et al., 2014).

Soil survey and soil mapping unit composition
A total of 20 soil pits were opened for complete morphological description and sample collection.Additionally, 124 pits (extra samples) were collected (mini-pits and auger samples), characterized, and classified, for a total of 144 sites of soil sampling and description (Figure 1).The soils were surveyed in a systemic way throughout the area, considering the existing tracks and roads within the geoenvironments, stratified by a senior pedologist.In addition, the settlement was stratified into regions according to knowledge obtained from settler families (Table 1).
Soil sampling and description was carried out according to the IBGE (2015).To define the sampling sites, the area was stratified into geoenvironmental units (lowlands and terraces, hydromorphic plains, upland tops, and colluvial slopes).
The soil samples were subjected to physical and chemical analyses according to Claessen (1997) and classified at the third level of classification, according to the Brazilian Soil Classification System -SiBCS (Santos et al., 2013).Total 124 (1) Brazilian System of Soil Classification (SiBCS).
(2) International soil classification system (IUSS Working Group WRB, 2015).Full profile = sites where the soil pits were opened for sampling and morphological description of all pedogenetic horizons; Extra sample = sites where soil pits were opened for characterization and description.
Rev Bras Cienc Solo 2018;42:e0170421 The mapping units (MUs) were defined by grouping areas of soils on a 1:50,000 scale, according to the Technical Handbook of Pedology (IBGE, 2015).The MUs were delineated using visual interpretation of the soil pattern distribution in the area.Soil-landscape relationships were established using visual interpretation of field observations, soil descriptions, and the following subsidiary maps: 1) satellite imagery (GeoEye sensor obtained from the Google Earth platform, with spatial resolution of 0.41 m); 2) digital elevation model -DEM [generated from Advanced Land Observing Satellite (ALOS) images with 12.5 m of spatial resolution]; and 3) maps derived from the DEM (i.e., slope, elevation, solar radiation, aspect).

Selection of covariates for digital soil mapping
Due to the large amount of data available for use as covariates in modelling, it was necessary to use a data-mining technique to select the most suitable dataset as an optimal set of predictors to run the model, affording the lowest error.
Rev Bras Cienc Solo 2018;42:e0170421 We used a selection process where the covariates were ranked based on their importance, selecting the top ten with highest importance for soil mapping.The selection procedure was performed by using the Recursive Feature Elimination (RFE) algorithm, implemented on the Caret Package (Kuhn, 2017), combined with correlation analyses for removal of covariates with 95 % or higher correlation with others.
Recursive Feature Elimination is an algorithm that performs a backward selection, which avoids refitting many models at each step of the search (Kuhn and Johnson, 2013).When the full model is created, a measure of variable importance is computed and shows the ranks of predictors from most to least important (Kuhn, 2013).In this study, we used the Random Forest based RFE, which ranks the predictors based on the Gini index, as implemented in the Random Forest algorithm (Breiman, 2001).
The selection was run in two phases: the first, in which the highly correlated covariates were identified and the ones with the highest correlation removed (Kuhn, 2017), and the second, in which an importance rank was determined by RFE, allowing selection of the most important ones.In the first phase, categorical covariates were left out, and continuous ones were used in a pair-wise analysis that was run using the 'findcorrelation' function, as implemented in the Caret Package (Kuhn, 2017) to determine the correlation among covariates without their being assessed in a prediction model.A correlation matrix was generated, and the absolute values of pair-wise correlations were considered in the analyses.In these analyses, if two covariates have a high correlation, the function looks at the mean absolute correlation of each covariate and removes the covariate with the largest mean absolute correlation.A threshold value greater than or equal to 98 % was established as the critical value for the covariate to be removed.Once the dataset had been reduced by removing highly correlated covariates, the RFE procedure was run, establishing 10 as the maximum number of covariates to be left in the model.
Recursive Feature Elimination algorithm selection is an iterative process that eliminates the least important predictors from the model based on an initial measure of predictor importance (Kuhn and Johnson, 2013).We selected Random Forest as the base model for running the RFE.First, a full Random Forest model is created using all covariates, and a measure of covariate importance is computed to rank the covariates from most to least important.Secondly, the Random Forest model is tuned and trained iteratively with the most important covariates and without the least important ones until only 10 covariates remain.

Machine Learning Algorithms
Eight algorithms were assessed: Random Forest (RF), Extreme Gradient Boosting (xgBoost), Ranger Random Forest (Ranger), Weighted Subspace Random Forest (WSRF), Support Vector Machine with Linear Kernel (SVMLinear), Support Vector Machine with Polynomial Kernel (SVMPoly), Bagged AdaBoost (AdaBag), and Extra Trees -Random Forest by randomization (ExtraTree).All algorithms were implemented in the Caret package (Kuhn, 2013) and run on the R program (R Development Core Team, 2016) using the available parameters.

Random Forest (Ranger), Weighted Subspace Random Forest (WSRF), Random Forest by randomization (Extra Trees)
Random Forest (RF) is a general term used for a set of "tree-based" classifiers.The method was developed as an extension to the regression tree classifier (CART) to improve model performance (Breiman, 2001) and applied to predictions of discrete and categorical variables.
The working principle of RF operation is the same as that used in CART, except that many trees are constructed, resulting in a "model of forests" and, in addition, during growth of the trees there is no pruning.For each tree, only a subset of the prediction variables is Rev Bras Cienc Solo 2018;42:e0170421 used.The number of predictors used in the construction of each tree and the number of trees to be constructed in the forest vary, depending on the dataset (Liaw and Wiener, 2018).Each tree is constructed from samples selected from the total dataset by using the bootstrap method (Efron and Gong, 1983).This selection allows more robust error estimates with a second set of samples, called Out-of-Bag (OOB), also selected from the total dataset by the bootstrap and excluded from the prediction.
Variant models have been developed from the principle of RF operation [e.g., Ranger (Wright and Ziegler, 2015;Wright, 2017), WSRF (Meng et al., 2012), and ExtraTrees (Simm and Abril, 2014)].Ranger has shown to be efficient in soil nutrient mapping on an ensemble model by Hengl et al. (2017a) and in soil properties by Hengl et al. (2017b).
The Weighted Subspace Random Forest (WSRF) is an algorithm that can classify large datasets using a mechanism similar to RF, but applied to small subsets.It is a new method of weighting variables used in sample selection, rather than using the traditional method of random selection.This new approach is particularly useful in models using large dimension datasets (Meng et al., 2012).However, the error effect of setting the weights when using low dimension datasets is substantial and may result in poor performance of the model (Li and Zhao, 2009).
The use of ExtraTree (Extremely Randomized Trees) is similar to RF.The difference is that whereas each branch of the forest in RF chooses the best cut-off threshold for categorical values, ExtraTree chooses the cut-off value (evenly) randomly.However, like RF, the categorical data with the highest gain (or better score) is chosen after the cutoff limit is set (Simm and Abril, 2014).

Support Vector Machine (with Linear Kernel/with polynomial Kernel)
The Support Vector Machine (SVM) algorithm, introduced by Cortes and Vapnik (1995), performs a binary classification.The basic idea of the model is to find a hyperplane that best separates the points of two classes by maximizing the margin between those points of both classes that are closest to each other, called support vectors.
The SVM function can easily work with non-linear patterns by transforming the original data into new features, which is a basic characteristic of the kernel function.This hyperplane from the maximized margins is used as a criterion for subsequent classification (Kuhn, 2013).
The SVM has been applied to soil mapping due to its ability to handle large datasets, learning complex data-classes and making decisions regarding separation, and also because it is based on traditional statistical methods applied to pedometric studies (Brungard et al., 2015;Brevik et al., 2016;Heung et al., 2016;Forkuor et al., 2017).
In addition, the SVM performs well in low-dimension datasets and has high power for generalizing information (Li and Zhao, 2009).
To run the SVM, we used the kernlab package (Karatzoglou et al., 2016).The SVM with radial basis function (SVMRadial) uses machine learning and kernel functions for pattern recognition.

Bagged AdaBoost (AdaBag)
AdaBag is an algorithm that makes possible to implement two popular tree-based models: boosting and bagging.The main difference between these two ensemble model methods is that boosting builds its base classifiers in sequence, updating a distribution over the training examples to create each base classifier, whereas bagging combines individual classes in repetitions of bootstrap training (Alfaro et al., 2015).
Adaboost can process weighted data, and the classification error rate for each test is used to update the distribution over the training samples.The weights of the classification Rev Bras Cienc Solo 2018;42:e0170421 errors of the samples are increased, while the weights of the correct classifications decrease, controlling the error rate and forcing the classifier to focus on more difficult samples (Alfaro et al., 2013).

Extreme Gradient Boosting (xgBoost)
Extreme Gradient Boosting (xgBoost) is a hybrid algorithm of the tree model, applied to regression and classification predictions.It is an ensemble model that works based on prediction of the weak set, which is highly effective and widely used (Chen and Guestrin, 2016).Its scalable working mechanism is based on stages with increases in the number of trees.A differential aspect of the model is its ability to handle sparse data.On soil science, the model has been applied to mapping soil nutrients (Hengl et al., 2017a) and soil properties in the United States (Ramcharan et al., 2018).
The model is optimized by adjusting parameters, such as the number of interactions, the maximum number of tree branches, the multiplication parameters of tree shrinkage after each interaction, reduction in the influence of each tree, the amount of space for future trees, minimum number of weighted input features, the sum of the instances, and the percentage of samples for sampling (Chen et al., 2017).The model is trained in an additive way, using learned patterns to redefine the parameters (Chen and Guestrin, 2016).

Model training and evaluation
The soil taxonomic classes described at 144 sites (Table 1) were used as the dataset for training and validating the models.The number of soil sites per MU was as follows: CXbd (38), GXbd (18), LAd (31), LVAd (23), PVAd ( 19), and RL (15).Six MUs with a single soil component were defined within the settlement area for the soil mapping using machine learning.This soil mapping approach followed the guideline for a 1:50,000 scale of soil mapping, considering the detail of spatial information and the soil survey approach (IBGE, 2015).
Model training used 10-fold cross-validation repeated 5 times.This method is suggested as the most weighted in evaluation of model performance when the datasets are not large enough to be split into training and external validation samples (Kuhn, 2013).
To evaluate the performance of the algorithms for soil mapping, the confusion matrix was used to derive the Kappa indexes and the overall accuracy.The matrix is constructed using a discrete multivariate technique to evaluate thematic accuracy (Congalton and Green, 2009).The Kappa coefficient (K) is a measure of actual agreement minus chance agreement.In other words, it is a measure of how much the classification data agrees with the reference data, and it can be calculated as follows: Eq. 1 in which K is the estimate of the Kappa; n ii is the value in row i and column i; n i+ is the sum of line i, and n +i is the sum of column i of the confusion matrix; n is the total number of samples; and c is the total number of classes.The Kappa index ranges from 0 to 1, and performance is considered moderate for values between 0.20-0.60(Landis and Koch, 1977).
To perform a statistical analysis for detecting difference between the Kappa obtained by the algorithms, when mapping the soil, we used the Z-test, which compares pairs of algorithms according to the following equation: Eq. 2 Rev Bras Cienc Solo 2018;42:e0170421 In which Z is the value that defines the test, K 1 and K 2 are the Kappa values of algorithms 1 and 2 taken for the comparison; and "var" is the variance of Kappa algorithms 1 and 2, respectively.The Z value makes possible to evaluate the difference between classifiers in performing the mapping with the available data.The statistical significance test follows the null hypothesis considered in this study, where there is a 5 % confidence interval, with a Z value greater than 1.96, which is significant (Congalton and Green, 2009).

Variability of Mapping Units among DSM models
To identify areas in the map where the machine learnings showed greater variability in mapping the soil MUs, we analyzed the eight maps generated from the algorithms, performing zonal statistical analysis with the Zonal statistical function from the "Spatial Analyst Tools" package of the ArcGis 10.2 software.Variability analysis runs a per-pixel calculation among the soil maps to calculate the MU "Variety" (i.e., calculate the number of unique MUs at each pixel).If the classifiers agree on mapping a MU at one pixel, the function returns the number one (one unique MU); for two different MUs assigned, it returns the number two, and so forth.Hence, the larger the number of classifier disagreement, the larger the mapping variability.

Analysis of agreement between conventional map and digital soil maps
Concordance analysis between the conventional soil map and the soil maps from the machine learning algorithms was performed to give a comparative measure in terms of area mapped for each MU.The soil in the conventional map was converted to a raster file and analyzed by map algebra using Boolean logic.To calculate pixel-by-pixel agreement, a concordant pixel returns the value 1 and a non-concordant one returns the value zero.
As the soil map was composed by MUs with soil associations up to three soil classes, the map was reclassified for maps of a single MU component (i.e., soil map of the first, second, and third soil class component of the MU).For each DSM map comparison, the concordance with the three maps of MU components were added up, and agreement was expressed as the area of matching soil classes on both maps.
Latossolos cover a large area, corroborating previous soil surveys in a nearby region (the Coronel Pacheco research station) (Santos et al., 1980), whereas Argissolos occur as inclusions or in association with Latossolos.At the hilltops, Latossolos are most developed and easily distinguishable and mapped on flat or convex slopes in the uplands.On a landscape scale, color distinction at the second level is difficult due to the small difference between the 5YR and 7.5YR; hence, we mapped associations between these Latossolos classes (Figure 2).
Shallow Cambissolos on deep saprolites are also frequent in the area, also mapped in association, mainly occurring on steep concave slopes, where erosion is greater than pedogenesis (Schaefer et al., 2013).
Rev Bras Cienc Solo 2018;42:e0170421 In the lowlands, the typical MU is the association of Gleissolos and Cambissolos (GXbd2), identified on the satellite image with a gray pattern indicating seasonal flooding, whereas Gleissolos (GXbd1) are found on the darkest areas of the image, indicating swampy areas of permanent waterlogging.
The areas of highland ridges on the steep slopes of the Serra da Babilônia with sparse rocky outcrops contain Neossolos Litólitcos (RL), which are associated with shallow Cambissolos, since soils in these areas are difficult to map separately due to the scale adopted (1:50,000).

Covariates selected for soil mapping using machine learning
The ten covariates selected for soil mapping, ranked by importance, were as follows: 4 morphometric covariates (slope height, standardized height, TWI, and MRVBF); 3 images from Landsat-8 (bands 1, 7, and 11); two climatic maps [rainfall of the coldest trimester (BIO19) and the annual temperature range (BIO07), which stands for the maximum temperature of the warmest month, minus the minimum temperature of the coldest month]; and the map of Euclidean distance from the drainage network (eddrainage).The high number of covariates related to landform is due to the influence of relief on the soil formation process in the area, where topography and landform are key factors for soil diversity.
The contribution of each covariate to the soil-mapping model (Figure 3), measured as the percentage of decreasing accuracy, highlights the morphometric covariates as highly important in mapping soil distribution in the study area.The Multi-resolution index of valley bottom flatness (MRVBF) is the most important, contributing 21.6 %.
The TWI contributed 15.5 % and slope height 13.6 %.The two bioclimatic covariates, related to temperature and rainfall, scored around 17.5 % each, and were the second more important.The covariates selected were derived from a large variety of sources (Figure 3), including spectral information from satellite imagery, Euclidian distance from the drainage network, and bioclimatic and morphometric sources.

Evaluation of soil mapping with machine learning algorithms
Soil mapping using the machine learning algorithms exhibited a Kappa index ranging from 0.42 to 0.48 and an overall accuracy ranging from 0.54 to 0.58 (Table 2), which can Rev Bras Cienc Solo 2018;42:e0170421 be considered moderate (Landis and Koch, 1977).Overall, the xgBoost showed better performance, with a higher Kappa (0.48).It had low performance in mapping Latossolos Vermelho-Amarelos (39 %) and high performance in mapping Gleissolos (85 %).The ExtraTree, in contrast, was less accurate in mapping Argissolos (41 %) but showed high performance in mapping Gleissolos (94 %), like the RF, which mapped Gleissolos with 94 % accuracy.
The RF algorithm showed low efficiency in mapping CXbd (28 % accuracy).The MU CXbd were not adequately distinguished from Latossolos (LAd and LVAd units), although CXbd has more soil data (38) than Latossolos (LAd -31 and LVAd -23 soil pits) (Table 2).The area mapped as Cambissolos is also larger than areas of Latossolos.When comparing the algorithms performance on mapping the LVAd, the lowest accuracy was that of the SVMPoly, with not a single classification of this soil correctly allocated to this MU (Table 2); the best performance on mapping LVAd was achieved by the ExtraTree algorithm (43 %).
Overall, the algorithms showed moderate performance in distinguishing CXbd from Latossolos (LAd and LVAd).Conversely, all algorithms showed good performance in mapping GXbd, in which the lowest accuracy was 82 % by the AdaBag algorithm, and the highest 94 % (RF, Ranger, and ExtraTree).These results indicate that the covariates used made it difficult to distinguish some soils on dissected terrain (Table 2).
The soil maps obtained with each algorithm are shown in figure 4. In a cursive rapid visual analysis, no significant differences are noticed, except for the LVAd, which was mapped with the smallest area by the SVMPoly (Figure 4f), since the SVMPoly misclassified LVAd as CXbd (Table 2b).Similarly, confusing LVAd with CXbd occurred with all algorithms.Two MUs, CXbd, and LVAd, showed highest standard deviation in the size of area mapped (Table 3) comparing all the DSM maps, as a result of their confusing and difficult mapping.The results showed that the SVMLinear has an advantage over the SVMPoly.
Most MUs were mapped in a similar way for all algorithms, especially in the highlands, where RLd occurs (standard deviation of area mapped = 0.4 %).The MU LAd was mapped in a similar way by all algorithms, as well as GXbd, which occurs near the drainage network (Figure 4), and showed standard deviation of 0.9 %.Among algorithms, the mapping of PVAd showed more dissimilarity in its spatial distribution and mapped area in the SVMPoly mapping, which showed only 5.3 % of the area occupied by this soil type, whereas four algorithms showed a percentage of PAVd of around 8-9 % of the settlement.The SVMLinear had the worst performance, which is consistent with Brungard et al. (2015), who made the same observation.In this respect, in a study performed by Heung et al. (2016), the SVMRadial was superior to the SVMLinear in models of soil mapping at the order and suborder level.2016) also compared soil mapping in great group and order level classifications in a valley area in Vancouver, Canada, using ten machine learning algorithms.The authors found that RF was among those of highest performance, along with CART and the SVMRadial.In their study, Multinomial Logistic Regression (MLR) showed lower overall accuracy (0.42) than RF (0.63) for soil orders.Therefore, statistical difference was not investigated.Only Jeune et al. (2018) reported a statistical difference in Kappa in soil mapping in Haiti, in which RF showed a much higher Kappa (0.55) compared to MLR (Kappa = 0.33) using external validation rather than cross-validation.

Heung et al. (
A similar study using a tree-based model (i.e., Random Forest -RF) for soil mapping in northern Iran (Pahlavan-Rad et al., 2016) applied both RF and Multinomial Regression models in mapping soils classified at the great group, subgroup, and family level and achieved highest accuracy with RF (33.9) and MLR (22.9) at the subgroup level.Similar accuracy for MLR was observed by Souza (2013) for mapping soil classes in the Rio Doce Basin, Brazil (Kappa = 0.35).
Following a different approach to obtaining soil data, Chagas et al. (2017) used legacy data extracted from a map for training three machine learning algorithms, obtaining highest accuracy for Random Forest (Kappa = 0.76).
The similarity in performance of machine learning algorithms observed in the present study and in the literature suggests that the performance of machine learning for soil mapping is rather similar when evaluating the Kappa index with the Z-and t-test, highlighting that the quality and robustness of datasets is of greater importance than the classifier itself.Furthermore, the number of samples per MU and the level of taxonomic classification of the soil are of key importance.As Brungard et al. (2015) pointed out, the number of pedons within each MU directly influences the accuracy of the model.Although we did not test sampling size per MU, the fact that each model scored similar Kappa for the eight models assessed, while within each model, the accuracy of the MUs did not follow the trend, indicates that for some soils the sample size should be larger in order to model their complex distribution across the landscape.In this regard, Gleissolos and Neossolos, with a limited dataset compared to LVAd, scored higher accuracy.
The number of MUs in the conventional soil map (7) differs from the DSM maps ( 6), as well as the number of soil components per MU.In the conventional map, there are two MUs made up of two soil components and two made up of three components, while only two MUs have a single component.In the DSM maps, all MUs are made up of a single soil component.Therefore, the MUs GXbd1 and GXbd2, and LAd1 and LAd2 on the conventional map were combined to allow better comparison with the MUs mapped with the machine learning algorithms (Table 3).The correspondence among MUs of these two mapping approaches demonstrates generalized differences in the area for most MUs due to difference in MU components.A similar area and proportion for GXbd (around 565 ha and 13.4 % of the settlement on the conventional compared to 10.7 to 12.9 % on the DSM maps).The LAd showed a 10 % increase in the area mapped by the conventional method.Cambissolos Háplicos Tb Distróficos, and for PVAd, both appear in soil associations, therefore could not be directly compared.The PVAd was not mapped as first soil component of MUs on the conventional map as its similarity to Latossolos allows distinction only on the textural gradient of the B horizon, and CXbd was second or third component on three MUs.
The MU areas in each map and the corresponding percentages of occurrence (Table 3), show large discrepancies for two MUs.For example, the mapped area for MUs for LVAd varied from 809 ha, mapped by the xgBoost algorithm, to 42 ha, mapped by the SVMPoly.The opposite was observed for CXbd, with the smallest area mapped by xgBoost (1,332 ha) and the largest by the SVMPoly (2,166 ha).The discrepancy between these two algorithms in the mapping of LVAd and CXbd can be attributed to the SVMPoly error in mapping soils on hillslopes, as observed in the confusion matrix, where all data of LVAd was allocated to CXbd (Table 2c).Apart from these two MUs, all the remaining MUs showed low standard deviation (0.4 to 2.1 %) in their mapped area.
To assess whether the maps had statistical differences based on the Kappa index values, we used the Z-test (significance level of 0.05), with no differences between algorithms Rev Bras Cienc Solo 2018;42:e0170421 (Table 4).Although the tree-based algorithms (RF, Ranger, and WSRF) performed best in mapping the soils, there was no statistical difference in the mapping accuracy of these algorithms compared to binary regression models.Rev Bras Cienc Solo 2018;42:e0170421 In this respect, Brungard et al. (2015) investigated the performance of eleven algorithms and reported similar accuracies for mapping soil classes in three different sites in the USA with three datasets of covariates, with no statistical difference by the T-test.In most models, the highest performance was by RF.Mosleh et al. (2017) compared four classifiers (MLR, RF, ANNs, and Boosted regression tree) for soil mapping in Iran.The authors investigated model performance on all soil taxonomic levels, from order to family, and found out that all algorithms had the same ability at any determined taxonomic level, with better accuracy at higher taxonomic levels.

Assessment of MU variability in DSM
The MU variability of the eight maps, assessed pixel by pixel (Figure 5), shows agreement or disagreement among machine learning algorithms in mapping.In less than 1 % of the areas, up to five different MUs showed disagreement in mapping.The darkest colored parts of the map show areas where the algorithms most disagreed, and, conversely, the lightest colored areas indicate where the algorithms most agreed.
The number 1 (yellow) designation corresponds to areas where all algorithms agreed in classification of the MU, representing 43 % of the total settlement area, whereas the number 2 label corresponds to areas where one algorithm classified the MU differently, representing 45 % of the total area.Areas labelled 3, 4, and 5, representing 3, 4, and 5 disagreements, respectively.Together, 3, 4, and 5 comprise about 12 % of the total area and therefore indicate low variability of soil mapping by machine learning analyses in the settlement area.As can been seen in figure 5, the areas in green, disagreement occurring between two (2) MUs, are mostly the areas of LVAd that were misclassified by the SVMPoly due to confusing LVAd and CXbd.In the case of this study, we consider that the DSM models showing good agreement in mapping highlight the SVMPoly as the algorithm with the lowest performance for spatial mapping.Rev Bras Cienc Solo 2018;42:e0170421

Analysis of map agreement
Maps showing concordance between the conventional soil map and the maps generated by the machine learning algorithms are presented in figure 6.The results of this comparison show good agreement in gentle hilly areas and lowland landscapes.The greatest disagreements occurred in the mountainous areas with varying concave and convex slopes and steeper slopes.The proportion of agreement between machine learning mapping and conventional mapping ranged from 63 to 67 %, minored by similar spatial pattern of the maps (Figure 6).
The mapping agreements observed in this study for the eight algorithms are similar to those found by Chagas et al. (2017); they found agreement between the conventional map and DSM maps as follows: for RF, 68.7 %; for ANNs, 62.8 %; and for Decision trees, 62.3 %.The approach by Heung et al. ( 2016) calculated agreement based on point-observations from 262 soils extracted from a polygon map, which differs from the area-base comparison adopted in this study.Their mapping showed agreement of 72 % for RF and 63 % for the SVMLinear.
Considering the mapping scale of this study (1:50,000), we can consider the results as superior to those of Vasques et al. (2015), who compared the agreement of a soil map of logistic regression with a conventional soil map on 1:100,000 and 1:20,000 scales, and reported agreements of 45 and 32 %, respectively.Similarly, Bazaglia Filho et al. (2013) compared a soil map generated by a Fuzzy k-means algorithm with conventional maps on a 1:10,000 scale that were produced by four experienced pedologists.Their comparison showed agreement ranging from 56.26 to 71.85 % for MUs classified at the order level.
In order to identify areas of uncertainty within the MUs mapped with the machine learning algorithms, we compared the percentage of disagreement for each MU of the conventional map (Figure 6i).The LAd unit showed the highest percentage of disagreement, at 52 %, which can be explained by the fact that its occurs as a single soil class in the MU.In contrast, the RL unit did not show disagreement for any of the models, whereas CXbd and GXbd units had the lowest disagreement among the maps (Figure 6i) at only 3 % for CXbd and 5 % for GXbd.Since they are in association, there is increasing probability of agreement with digital mapping.
As a follow up to this study, further soil sampling should be carried out in areas of disagreement to assess possible approaches to increase the accuracy of machine learning.The map of MU variability generated from all machine learning models for soil mapping, and maps showing agreement between the conventional soil map and the DSM allow identification of target areas where mapping uncertainty requires further soil surveys.Hence, machine learning algorithms can be envisaged as a DSM tool with better performance in saving resources and helping map soils in tropical areas of hilly, complex landforms.

CONCLUSIONS
The machine learning algorithms used showed satisfactory performance (Kappa index ranging from 0.42 to 0.48) in soil mapping of a tropical dissected and mountainous terrain.The statistical similarity between the algorithms used for mapping soils shows their feasibility in mapping tropical landscapes.
The most confused mapping units (MUs) were the Cambissolos (CXbd) and Latossolos (LVAd, LAd) on the sloping highlands, while the lowland Gleissolos (GXbd) showed the greatest accuracy in the mapping procedure under all machine learning models.
By showing almost half of the area in agreement, variability of MUs mapped by DSM algorithms was satisfactory, whilst only a small proportion of the area showed disagreement Rev Bras Cienc Solo 2018;42:e0170421 among three or more algorithm.From this, we can conclude that excluding the SVM polynomial bases, all machine learning procedures performed similarly.
As DSM mapping showed satisfactory concordance with the conventional soil map, in combination with the other observations made in this study, the application of pedometric methods such as DSM algorithms should be seriously considered as a complementary approach to conventional methods for mapping complex tropical areas.

Figure 1 .
Figure 1.Localization of the soil pits used for soil classification in the Dênis Gonçalves Settlement, Mesoregion of the Zona da Mata region, Minas Gerais, Brazil.

Figure 2 .
Figure 2. Conventional soil map showing the distribution of soil mapping units (MUs) and their areas of occurrence in the Dênis Gonçalves Settlement, in the Zona da Mata region, Minas Gerais, Brazil.

Figure 3 .
Figure 3. Importance of covariates in the soil mapping, measured by the Gini index as implemented in the Random Forest model run with RFE covariate selection.

Figure 5 .
Figure 5. Map of MU variability among the eight soil maps generated with the machine learning algorithms in the Dênis Gonçalves Settlement (the values 1, 2, 3, 4, and 5 represent the number of distinct MUs assigned by the algorithms in a given pixel of the soil map, where 1 means agreement of all algorithms).

Table 1 .
Soil classes and number of soil pits surveyed per class in the Dênis Gonçalves settlement

Table 4 .
Accuracy performance and the significance of Z-test for the kappa indexes of each classifier