Digital soil class mapping in Brazil: a systematic review

In Brazil several digital soil class mapping studies were carried out from 2006 onwards to maximize the use of existing maps and information and to provide estimates for wider areas. However, there is no consensus on which methods have produced superior results in the predictive value of soil maps. This study conducts a systematic review of digital soil class mapping in Brazil and aims to analyze the factors which can improve the accuracy of digital soil class maps. Data from 334 digital soil class mapping studies were grouped and analyzed by Student’s t-test, Wilcoxon-Mann-Whitney test and Kruskal-Wallis test. When conventional maps were used for validation, the studies showed average values of 63 % and when field samples were used, 56 % for Overall Accuracy. Studies compatible with the Planimetric Cartographic Accuracy Standard for Digital Cartographic Products (PEC-PCD) averaged between 4 % and 15 % higher accuracy than those of the incompatible group. There seems to be no evidence that increasing the number of variables and samples results in more accurate soil map prediction, but studies using variables related to four soil-forming factors enhanced accuracy. From a density of 0.08 MU km–2 and upwards, it became more difficult for studies to obtain greater accuracy. Artificial neural network classifiers and Decision Tree models seem to be producing more accurate digital soil class maps.


Introduction
Soil maps are crucial to environmental and agricultural management but conventional soil mapping is costly and time-consuming and existing soil maps are lacking in details. In the past few decades, digital soil mapping (DSM) methods have been tested and analyzed by the scientific community to maximize both the use of existing maps and information and to provide estimates for wider areas.
There is geographic variation in the uptake of digital soil mapping technologies and certain countries have made considerable progress. However, Minasny and McBratney (2016) and Arrouays et al. (2017), found that methodologies and assessment of DSM results need to be standardized, errors should be minimized and better evaluated, and strategies devised to overcome the lack of detailed cartographic bases and dearth of soil maps and data.
Recent reviews indicate other challenges such as soil mapping of flat terrains, simulation of the soil heterogeneity on a regional scale, linking DSM and soil spectroscopy (Zhang et al., 2017) and the use of process-based soil-landscape evolution modelling with interactions between pedology and DSM (Ma et al., 2019).
Studies of digital soil class mapping in Brazil began in 2006 (Giasson et al., 2006) and were re-analyzed six years later by Ten Caten et al. (2012) who found an agreement average of 48 % (Kappa coefficient) in 11 articles. Overall accuracy (OA) was not analyzed and the classifiers most used were logistic regressions, though there is no consensus on which methods have shown better results in the prediction of soil maps. Cancian et al. (2018), conducted a bibliometric analysis of the scientific production of DSM in the period from 1996 to 2017 and perceived that publications are increasing and that Brazilian research is gaining prominence on the world stage. The authors found approximately 200 researchers working with DSM in Brazil.
Those who intend to produce digital maps of soil classes can refer to publications that test different classifiers, sets of predictor variables and sample size. However, there is no study that presents the data from these publications in an integrated and systematic way.
In this study we sought to produce information from data from several publications to assist in decision making for those who want to produce digital soil maps. The aim was to analyze the factors used in digital soil class mapping and to assess the accuracy of the studies based on a systematic review of articles published between 2006 and 2019 in Brazil. Digital soil class mapping in Brazil Sci. Agric. v.78, n.5, e20190227, 2021 Support Vector Machines, Artificial Neural Networks, naïve Bayes, Logistic Regression and Decision Trees.
The following articles were excluded (exclusion criteria): a) articles whose study objective was the digital mapping of soil attributes or soil polygon disaggregation mapping; b) articles with no spatial soil class prediction using classification algorithms (e.g., map algebra) and/or mapping units (MU) delimited manually; c) articles with only qualitative validation of spatial soil class prediction; and d) articles that used unsupervised classification methods (clustering) and Fuzzy Logic because the process of modelling is very different from supervised learning methods that are the focus of this study.
Two electronic libraries were used for this study a) Portal de Periódicos CAPES (Coordination for the Improvement of Higher Education Personnel) and b) SciELO (Scientific Electronic Library Online). These are electronic libraries financed by public funds to promote free access of scientific journals. The Portal de Periódicos CAPES offers free access for professors, researchers, students and employees of participating institutions such as all federal institutions of Brazilian higher education among others. The SciELO promotes free public access to scientific journals from developing countries. Currently, Portal de Periódicos CAPES offers access to complete texts available in more than 38,000 national and international journals, while SciELO has 1,285 active journals.
For the moment and the objectives of this study, research in these two electronic libraries proved to be adequate. However, researchers are being encouraged to internationalize Brazilian research and for future studies it is worth considering consulting other databases as (e.g., Web of Science and Scopus).
All the articles from the survey were tabulated and analyzed. If an article met the inclusion criteria then it was included for participation in this study; if an article met the exclusion criteria then it was not included. The Mendeley Reference Manager (Dearden et al., 2011) was used for bibliographic management.

Database
For the construction of the database, data of all studies contained in the articles were extracted. We consider all the soil class prediction tests carried in an article as studies; e.g., if in an article featured three learning algorithms were compared for prediction soil classes, then these tests were considered as three studies. The same occurs if two or more sets of predictor variables were compared or any tests that were performed that present the respective validation values. Thus, the result of this is a relationship of "one-to-many"; i.e., one article to many studies.
The following quantitative and qualitative data were extracted from the studies of the articles selected: a) year of publication; b) reference city of the study area; c) size of study area (km 2 ); d) cartographic scale; e) number of mapping units; f) number of samples used in the predictive models (pixels from raster data of legacy map and/or points of fields observations); g) digital elevation model (DEM) used; h) pixel size (m); i) number of predictor variables used; j) learning algorithms; k) overall accuracy (%); and l) Kappa coefficient (%).

Reproducibility and exactness assessment groups
The studies were assigned to the reproducibility assessment group where they were validated using conventional maps. Once they were validated using points of field observations they were assigned to the exactness assessment group.

Soil-forming factors
All the predictor variables used in the selected articles were extracted and assigned to a soil-forming factor attribute such as climate, parent material, organisms, relief and time (Table 1). Thus, it was possible to calculate the number of soil-forming factors used in each study.
A number of the predictor variables have indirect or multiple-factor relationships with a soil-forming factor (Ma et al., 2019); in this study we associated variables that were directly related to a particular factor.
For the climate soil forming factor, predictor variables that influence temperature and soil moisture were assigned. For the parent material the characteristics of the mineral solid soil phase as well as the lithology of the environment used whereas for the organism the biological characteristics of the environment, and for relief the terrain model derivatives were used. For the time factor, only the geomorphic surface variable was used and this was used in two studies (Arruda et al., 2013 andArruda et al., 2016).
The principal components were used as predictor variables for elevation, hydrology and curvature (Ten Caten et al., 2011a), the only component characteristics explained by the authors were assigned to a soil-forming factors.

Compatibility with the Brazilian map accuracy standard
The Cartographic Accuracy Standard for Digital Cartographic Products (Padrão de Exatidão Cartográfica dos Produtos Cartográficos Digitais -PEC-PCD) is the Brazilian standard for the evaluation of the map accuracy published in the version 2.1.3 of the "Especificação Técnica para Aquisição de Dados Geoespaciais Vetoriais -ET-ADGV" (Diretoria de Serviço Geográfico, 2011). According to this standard, digital products are classified into four classes ("A" -more accurate, "B", "C" and "D" -Digital soil class mapping in Brazil Sci. Agric. v.78, n.5, e20190227, 2021 less accurate) that indicate acceptable both altimetric and planimetric errors at different cartographic scales; e.g., for the scale 1:10,000 the PEC-PCD planimetric values are: 2.8 m ("A"), 5.0 m ("B"), 8.0 m ("C") and 10.0 m ("D"). Thus, we infer the compatibility of the studies to the PEC-PCD planimetric by the pixel size and scale used; e.g., if a study used pixel size of 15 m and scale 1:10,000, it was considered incompatible with the PEC-PCD.
The studies were divided into groups of compatible and incompatible with PEC-PCD. In this study, the compatibility with planimetric PEC-PCD only indicates the studies that exhibited compatible scale and pixel size and not the positional precision of variables in relation to field coordinates.

Classifier groups
The studies were grouped according to the type of learning algorithms used. The list of all algorithms used in the studies and the types of classifiers to which they were associated are presented in the results and discussion item.

Statistical methods for comparing the groups
The method of Zuur et al. (2010) was used for exploratory data analysis which includes graphical observations and statistical tests in an R environment.
Overall accuracy data normality was tested by the Shapiro-Wilk and Kolmogorov-Smirnov statistical test, whereas Brown-Forsythe test (modified Levene test) was applied to analyze the homogeneity of variance.
When both normality and homogeneity assumptions were met, parametric tests were applied. The Student's t-test was used to compare the means between two groups.
Where both normality and homogeneity assumptions were not met, non-parametric tests were applied to the non-transformed data; i.e., the original overall accuracy data. The Wilcoxon-Mann-Whitney test was used to determine whether the distributions between two groups were equally located. The Kruskal-Wallis test was applied to verify if there were differences between three or more groups. When an inter-group difference was observed, the Dunn post-hoc test was used in each pair of groups.
Where the overall accuracy data met one of the assumptions (i.e., normality or homogeneity), they were transformed using the Box-Cox method (Box and Cox, 1964). The transformed overall accuracy data were tested by the Shapiro-Wilk and Kolmogorov-Smirnov for test data normality; and the Brown-Forsythe for testing the homogeneity of variance. If both normality and homogeneity assumptions were met, parametric tests were applied to the transformed data; otherwise, non-parametric tests were applied to the non-transformed data.

Bibliographic analysis
We included 42 articles that met the requirements (i.e., inclusion and exclusion criteria) for participation in this study (Table 2). These articles contained 334 digital soil class mapping studies. The first digital soil class mapping article in Brazil was produced in the state of Rio Grande do Sul (Giasson et al., 2006); the authors evaluated logistic regressions to reproduce soil maps from a reference area. More than half of all the articles were conducted in the state of Rio Grande do Sul (24), followed by São Paulo (7), Minas Gerais (6) and the state of Rio de Janeiro (5). In most of the country, no digital soil class mapping articles that meet the requirements for participation in this study have been conducted. An average of three articles per year were published during the study period (2006 to 2019).

Descriptive statistics of the data extracted from the studies
The study areas varied from 1.75 km 2 (175 ha) (Pelegrino et al., 2016) to approximately 120,000 km 2 ,  (2017)). The number of MU varied, using a simplified legend, (Figueiredo et al., 2008), between 3 and 34 (Vasques et al., 2015;Silvero et al., 2019) with an average of 9.5 MU per study. About 75 % of the studies had a ratio of up to approximately 0.4 MU km -2 but one study carried out by Pelegrino et al. (2016) stands out for its high ratio of 2.8 MU km -2 , with a study area of 1.75 km 2 and five soil classes. The median number of samples was 2,463. The lowest number of samples (74) was found in studies that used field observations for spatial soil class prediction models based on Decision Trees and Logistic Regression algorithms (Silva et al., 2019). Nevertheless, in this article the authors conducted other studies with additional points that improved the prediction performance of each model. The largest number of samples (794,273) was found in a study conducted by Crivelenti et al. (2009) that used pixels from raster data to spatial soil class prediction models based on Decision Tree models.
The highest number of samples per area was 10,024 per km 2 , in a study by Pelegrino et al. (2016), when the authors used 17,542 samples in an area of 1.75 km 2 . Nevertheless, Dias et al. (2016) used 1,710 samples in an area of 1,100 km 2 . In both studies, pixels from raster data were used as samples. The size of the study area and the number of samples are not correlated (Pearson (r = 0.01)) which may reveal the lack of standardization in digital soil mapping (Ten Caten et al., 2012;Minasny and McBratney, 2016;Arrouays et al., 2017).
A mean of nine predictor variables were used per study; i.e., the variables selected and used in the predictive models per study. The maximum number of variables used in the same study was 43 (Silva et al., 2019); on the other hand, in one of the studies (Pelegrino et al., 2016) only two variables were used (aspect and wetness index) obtaining overall accuracy of 50 %.
Of the 334 studies, 263 presented cartographic scale and spatial resolution (pixel size) information used (Table 3); 38 studies were found incompatible with the planimetric PEC-PCD, since their pixel size is higher than that indicated at the cartographic scale.

Learning algorithms and types of classifiers
The following learning algorithms were used in the selected articles: Bagged AdaBag, BF Tree , C5 Decision Tree, CART, ExtraTree, J48, Logistic Model Trees, Maximum Likelihood, Multinomial Logistic Regression, Multilayer Perceptron, PART, Random Forest, Ranger Random Forest, Rep Tree, Support Vector Machine with Linear Kernel, Support Vector Machine with Polynomial Kernel, Weighted Subspace Random Forest -WSRF and xgBoost.
All learning algorithms were assigned to a type of classifier such as Artificial Neural Network (ANN), Bayes classifiers, Decision Tree (DT), Logistic Regression (LR) and Support Vector Machine (SVM). Approximately 95 % of the studies used DT, ANN and LR classifiers (Table 4).

Overall accuracy and Kappa coefficient
Of the 334 studies, the OA was used in 320 (96 %) while the Kappa was used in 190 (57 %). Although we do know that Kappa is often seen as problematic, if not flawed, because of a past attempt to compare accuracy to a baseline of randomness (Pontius and Millones, 2011), we analyzed it taking into account its frequent use in studies of digital mapping of soil classes.
The OA and Kappa medians were 62 % and 48 % of agreement respectively. The estimated population confidence interval for the OA median was 59 % to 63 % (CI (95 %) = 59 % -63 %). The Kappa value remained the same as that found by Ten Caten et al., 2012) when carrying out an evaluation of 11 Brazilian studies of digital mapping of soil classes.
The agreement variation in the Kappa is higher than the OA; the OA agreement probability density is higher than the Kappa as shown by the shape of the plot (Figure 1). The OA outliers are those below 23 %, as identified in articles by Chagas et al. (2010), Vasques et al. (2015), Pelegrino et al. (2016) and Silva et al. (2019).

Reproducibility and exactness assessment for Overall Accuracy
When conventional maps were used for validation (reproducibility assessment group), the studies showed average values of 63 % for OA; when field samples were used for validation (exactness assessment group), the studies showed average values of 56 %.
Through the Box -Cox transformation of the OA data, it was possible to use parametric testing for the reproduction and accuracy groups. Student's t-test results (t (150.53) = 3.73, p < 0.05)) indicates that the two validation groups are different and with a 95 % confidence interval the difference between estimated population means was 4 % and 12 %.
These findings indicate that the digital soil maps generated tend to have higher agreement in reproducibility than in exactness assessment; i.e., they were more accurate in reproducing legacy maps than representing actual soil distribution. That is understandable because when the legacy maps are used for training prediction models, the soil mapping units already correspond to well-identified landscape units, which makes adding more precise and up-to-date predictor variables useful to producing a better map (Ma et al., 2019).
Furthermore, the conventional maps used for validation are composed of polygons of MU and there is a relationship of "one-to-many"; i.e., one MU to many classified pixels. The validation by field data is usually performed by points and there is a relationship of "oneto-one"; i.e., one point to one classified pixel. In this way, validation by conventional maps increases the chances of random classification hits occurring within an MU. In addition, in the validation by field data any positional inaccuracy both of the points and that of predictor variables can compromise the validation. Therefore, the evaluation of the reproducibility is not enough to evaluate the exactness of predicted maps. For that, field data is necessary.

Environmental factors
There is a small correlation between the OA values and the size of the study area (Pearson (r = 0.18)); Spearman (ρ = 0.23). There is no correlation between the OA values and number of MUs (Pearson (r = -0.08); Spearman (ρ = 0.03)). These results suggest a random or practically non-existent association between the OA and  the size of the study area and the number of mapping units. These results are different from those found by Brungard et al. (2015) i.e., that machine learning models are most accurate when there are just a few soil classes.
To verify the effect of the two variables (the size of the study area and number of MUs) representing the density per km 2 (MU km -2 ), the k-means technique was applied to perform iterative data segmentation. After the tests, the data were partitioned into 15 groups (clusters) ( Table 5).
As densities (MU km -2 ) increase beyond the cluster 9 (lower limit = 0.08; centroid = 0.09; upper limit = 0.12), the OA is lower in most studies (Figure 2). Except for cluster 13, those above 9 showed a median OA below the lower limit of the estimated population confidence interval for the OA median (CI (95 %) = 59 % -63 %). On the other hand, for clusters below 9, the median OA in the studies are predominantly higher or within the estimated population confidence interval, except for cluster 4, whose median was below the CI (95 %). Approximately 50 % of the studies are in clusters 1 to 9 and 50 % between clusters 9 and 15. The density values (MU km -2 ) depend on the scale work and relate to the environmental complexity. The results indicate that the higher environmental complexity (with 0.08 MU km -2 or more) has a negative effect on the accuracy of the predicted maps.

Scale and pixel size
Of the studies with information available on the cartographic scale and pixel size, validated by the OA index, 85 % used spatial resolution compatible and 15 % incompatible with the planimetric PEC-PCD.
The result of the Wilcoxon Mann Whitney test (p < 0.05) suggests that the OA of the population median for the group of studies compatible with the PEC-PCD is higher than that of the incompatible group. The shape of the plot suggests that the OA probability density for the group of studies compatible with the PEC-PCD is higher than the CI (95 %) (Figure 3). The inter-group difference in the estimated population median is between 2 % and 12 % with a 95 % CI.

Sample size and density
There is no correlation between OA values and the number of samples (r = 0.04). Neither is there between OA values and sample density per km 2 (r = -0.08). In a study that aimed to extrapolate the soil map (Grinand et al., 2008) the authors found that the increase in the  number of samples improved prediction accuracy, whereas the increase in sample density did not improve accuracy.
It is important to underscore that these results are restricted only to sample number and density and not to sampling method. As such, prediction map agreement is associated with environmental and modelling factors which may include sampling methods.

Number of predictor variables and of soil-forming factor
There is no correlation between OA results and the number of predictor variables used (Spearman (ρ = -0.02); Pearson (r = 0.20)). The association between predictor variables and the respective soil-forming factor (Table 1) results in the number of formation factors used per study: a) only one formation factor (7 % of the studies); b) two factors (49 %); c) three factors (33 %); d) four factors (5 %); and e) no information available (6 %). None of the studies was associated with five soilforming factors. There is a trend to higher OA results as the number of formation factors and variables increased ( Figure 4A and B).
The result of the Kruskal-Wallis test (H (3) = 28.91, p < 0.05) indicated a difference between the groups of studies in which different numbers of soil-forming factors were used. The results of Dunn's post-hoc test indicated differences between groups with one and two, one and three, one and four, two and four, three and four formation factors (p < 0.05) and equality between those with two and three formation factors (p > 0.05).
The results indicate that the techniques applied in the set of studies analyzed here are sensitive to the conceptual structure given by the paradigm of soilforming factors adapted to digital soil mapping using the scorpan model (McBratney et al., 2003). The more completely the scorpan model is applied, the better the results obtained.

Classifiers
Due to the greater representativeness (95 % of studies), we compared the OA results of the following groups of classifiers: Decision Trees (DT; mean = 62 %; median = 63 %), Artificial Neural Network (ANN; mean = 67 %; median = 68 %) and Logistic Regressions (LR; mean = 45 %; median = 43 %). The results of the Kruskal-Wallis test (H (2) = 62.34, p < 0.05) indicated a difference between the groups analyzed. The results of Dunn's post-hoc test indicated that the OA is different between LR and DT, LR and ANN (p < 0.05) but between DT and ANN (p > 0.05) there is no difference. Figure 5 shows that the group of studies that used ANN had a median OA higher than the estimated populational CI (95 %). However, there is no evidence that ANN models are better than DT models. On the other hand, there is evidence that ANN and DT models are better than LR for predicting soil classes. An evaluation of the prediction models of soil properties (Khaledian and Miller, 2020) concluded that there is no one single correct learning algorithm. However, certain algorithms are more appropriate than others considering the purpose of the mapping. According to the authors, if the sample size is large, ANN would likely produce the best results. When interpretability of the resulting model is important LR and DT are more appropriate than others. Brungard et al. (2015) compared 11 learning algorithms and concluded that ANN and SVM were consistently more accurate than LR and DT algorithms.

Relationship between environmental and modelling factors
The results obtained in the present study demonstrate that the following factors determine higher OA values: a) density up to 0.08 MU km -2 ; b) spatial resolution and scale compatible with planimetric PEC-PCD; c) use of four or more soil-forming factors associated with predictor variables; and d) use of ANN and DT classifiers. Among these factors, only the density of MU km -2 is an environmental factor, which cannot be controlled by the user. The other factors are related to the modelling of digital soil class mapping, which are controlled by the user.
Graphical analyses ( Figure 6) and statistical tests were used to determine the possibility of bias in establishing the main factors, that is, to confirm if any modelling-related factor obtained a better OA because they were primarily distributed into lower MU km -2 values.
For the group of compatibility with PEC-PCD, there is a greater concentration of studies compatible in higher densities of MU km -2 . This indicates that there is no bias and reinforces the importance of using pixel sizes appropriate to the working scale to obtain better OA values.
For the groups of soil-forming factors and of classifiers, visual analysis was not conclusive. As such, a statistical test was necessary to determine whether the groups exhibited similar value variations in MU km -2 .
According to the Wilcoxon-Mann-Whitney (p > 0.05), data from the group with four soil-forming factors does not differ from the group with up to three factors in relation to MU km -2 , and does not, therefore, exhibit low density concentration bias. Thus, it can be considered an important factor in the improvement in soil prediction map agreement.
According to the Kruskal-Wallis test (H (2) = 18.68, p < 0.05), the classifier groups differ in relation to MU km -2 . The results of Dunn's post-hoc test indicated that  there is no difference between LR and ANN (p > 0.05). Between DT and LR, DT and ANN there are differences.
These findings indicate that the ANN performed better even though they were tested at higher densities of MU per area. In addition, the good results of the DT models may have had bias because most studies occurred in areas with lower environmental complexity.

Conclusions
Based on digital soil class mapping studies in Brazil conducted between 2006 and 2019 and considering that this is a small database, the results of this study may not be definitive, and the following can be concluded: The mean overall accuracy of the group of studies that used pixel size and cartographic scale compatible with planimetric PEC-PCD is greater than that of the group which did not use spatial resolution compatible with PEC-PCD.
There is no evidence that an increase in the number of samples and predictor variables results in more accurate soil map prediction. On the other hand, there is evidence that the use of more heterogeneous predictor variables in terms of soil-forming factors could result in improved accuracy.
The density of MU per area affects the agreement of prediction maps. From a density of 0.08 MU km -2 and upwards, it was more difficult for studies to obtain better overall accuracy values than their estimated population median counterparts.
There is evidence that ANN classifiers are more efficient than the LR in terms of predicting soil classes. There is no evidence that ANN are more efficient than the DT. However, high precision DT accuracy may have been achieved because the majority of tests were performed in areas of lower MU km -2 ; i.e., less environmental complexity.