A fuzzy logic-based expert system for substrate selection for soil construction in land reclamation

The mining industry can be one of the most impacting human activities. In the southern region of Santa Catarina (Brazil), open pit coal mining has left an extensive environmental impact. Since there was no topsoil in the abandoned open pit sites, it is necessary to provide a substrate for vegetation growth. However, the selection of the best substrate between multiple options is difficult. Thus, a fuzzy logic-based model is proposed. The proposed model was compared to reference models and to experts’ knowledge. Statistical analysis and validation were carried out with a correlation coefficient, a Kappa coefficient, along with the Accuracy, Precision, Sensibility Specificity, F-Score and Mathews correlation coefficients. The data set used to assess the proposed model presented a wide range of data, but for values such as aluminum saturation, higher values were common. The fuzzy logic-based expert system presented better results when assessing the behavior of the defuzzified output values with the crisp input values. The fuzzy model also followed the trend of the reference models (with R2 between 0.3639 and 0.5250). The comparison to the experts’ opinion demonstrated that agreement comes easily with extreme values (such as not suitable and suitable). However, using a Winner-TakesAll approach, the proposed fuzzy model had high scores for suitable soils for land reclamation’s soil construction. The proposed model can be used to define the best substrate for land reclamation. Some improvements, such as different parameters and increases in the number of interviews rounds, should be also tested.


Introduction
The mining industry is considered one of the most impacting human activities to the environment.It contributes to the alteration of the earth's surface and causes significant impacts on water, air, landscape, subsoil, and soil.In the latter, the impact occurs through the removal or modification of its fertile layers.The environmental degradation is an inherent process in the mining activities, where the impact intensity depends on the explored volume, mining type and the tailings produced (MENDES FILHO, 2004;MATOS, 2011).
Even after closing their activities, these mined areas, if abandoned, may present negative impacts to the soil, surface and groundwater, fauna and public health that can take place for decades (BEANE et al., 2001;TURNER, 2011;RIEUXWERTS et al., 2014).
Coal mining activities in the state of Santa Catarina (Brazil) began in the late 19th century.Initially, coal was exploited in large open-pit mines and without any environmental supervision by public authorities.The result was an uncontrolled disposal of overburdens and tailings from coal beneficiation (KOPPE, COSTA, 2008).
In front of these problems and by legal force, mining companies sought alternatives to minimize the negative impacts through the reclamation of degraded lands.The most suitable substrate to recover degraded areas is the topsoil, which must be stored during the opening of the mining pit to be used in the future (ZIMMERMANN, TRE-BIEN, 2001).However, such practice was not adopted and the topsoil was lost, and now it is necessary to extract clayey materials from clay deposits to assist the land reclamation, in a process called soil construction.
Therefore, in order to not perpetuate this degradation cycle, it is highly recommended to store the surface soil horizons (to reclaim the clay pit) and to use the subsequent soil horizons (B and C) to rehabilitate the coal mining degraded sites (ZIMMERMANN, TREBIEN, 2001).However, the latter horizons lack nutrients and fertilizers needed to be applied for a better development of the established vegetation.
The selection of the best clay deposit for the land reclamation depends on several parameters, such as the distance of the area to be reclaimed, material volume, land use and material characteristics.In spite of this, when several deposits are available, one should choose the one with the best attributes and cost benefit analysis.Several decision-making models are accessible, such as analytical hierarchical process (AHP), weighted sum methods (e.g.Amacher et al. 2007 andBaggio, 2010), and fuzzy logic (CORTÉS et al., 2000;MALININ, 2013).Demicco and Klir (2004) consider that one of the advantages of fuzzy logic is the possibility of modeling common sense, decision making and other human cognitive aspects, facilitating the representation of the experts' knowledge.Further details on fuzzy logic and its related terms, such as membership functions, fuzzy inference systems, defuzzification process and others, can be found in Zadeh (1988), Mendel (1995) and Ross (2004).
Fuzzy logic has been pointed out as an efficient modeling tool and has several potential uses, such as intelligent systems for home appliances and medical diagnostics, systems adaptations to user needs, information processing for robotic systems, risk analysis, image processing, among others (AGUILAR -M ART I N, 1995, LUGER, 2013, SINGH et al, 2013).
In soil science, fuzzy logic can be applied in the analysis and classification of soils for mapping, modeling of physical processes, geostatistics, agricultural and degraded soils quality indexes and others (McBRATNEY and ODEH, 1997;McBRATNEY et al., 2002;KAUFMANN et al., 2009;RODRIGUEZ et al., 2016).
Considering the need to select the best materials for soil construction in degraded areas by coal mining, this article proposes a fuzzy logic-based model for the selection of deposits of clayey materials employing their chemical and physical parameters in land reclamation projects.

Materials and methods
The assessment of the substrates to be used in the land reclamation was performed utilizing the parameters: Clay content (cc); Cation exchange capacity in clay fraction (cec-cf); Hidrogenionic potential (pH) in H 2 O; Aluminum saturation (as) and Phosphorus content (P).The adopted parameters are the same as those used by Back et al. (2014) and Bagio (2010), where those authors applied a weighted sum method to select the best substrate to be applied in a land reclamation.
Based on these proposed parameters, a fuzzy inference system was developed employing the programming language R (R CORE TEAM, 2015) along with the FuzzyToolkitUoN package (KNOTT et al., 2013).Soil data set from EMBRAPA (2004) was used in the proposed model.
The acquisition of the experts' knowledge and the development of the membership functions of the substrate parameters were based on the technical knowledge and experience of specialists in the application field (professionals in the area of soil science and land reclamation), as well as other references such as the Brazilian Soil Science Society (SBCS, 2004) to obtain the linguistic variables and their respective values.
Trapezoidal membership functions were established from the linguistic variables, which initially de-fined the crisp intervals, allowing the creation of several membership functions.In order to reduce the number of membership functions, the linguistic variables were transformed into "Suitable" or "Not Suitable" for land reclamation, following Sheehan and Gough's (2016) procedure (e.g.pH parameter can has 6 functions, extremely, strongly, and slightly acidic, neutral, slightly and strongly alkaline; these classes can be converted to "Suitable" or "Not Suitable").
Other membership functions types are available (such as triangular and Gaussian), however, Gruijter et al. (2010) point out that trapezoidal functions are more appropriate to assess soil quality.
A Mandani-type fuzzy inference system was used, since it supports decision making and facilitates the employment of professional expertise (DOU et al., 1999;KAUR, KAUR, 2012).The specialist's expertise was combined with a set of rules using the AND operator to establish a connection between the statements (e.g.IF Suitable pH AND Suitable P THEN Suitable Substrate).
The fuzzy output value is a Quality Index and is derived from the fuzzy rule assessment process.The defuzzified resulting values (i.e.output crisp values) represents the degree of suitability of this material for land reclamation and soil construction.Three sets of outputs were adopted, "Not Suitable", "Average" and "Suitable", and their intervals were equally spaced between 0 and 100, a common procedure for this type of assessment (MARCHINI et al., 2009).
The centroid method was applied for the defuzzification, which is widely used and presents results similar to the bisector and middle of the maximum methods.Also, results are better than the largest and smallest of the maximum methods (JANG et al., 1997;LUGER, 2013;NAAZ et al., 2011).Details of defuzzifing methods can be found in Klir and Yuan (1995) and Ross (2004).
In order to validate this fuzzy logic model, the results were compared to the Soil Quality Index proposed by the American Department of Agriculture (AMACHER et al, 2007) for forest soils and to the weighted sum method proposed by Baggio (2010).The soil sample data used can be found in EM-BRAPA (2004).
In addition, soil fertility experts were consulted to compare their opinion with the proposed system's output and to assess the agreement between both, using the Kappa coefficient (LANDIS, KOCH, 1977) and the Accuracy, Precision, Sensitivity, Specificity, F-Score and Mathews Correlation Coefficients (SOKOLOVA , L A PA L M E , 20 09; POWERS, 2011).

Description of the inputs and outputs of the proposed fuzzy model
Embrapa's (2004) soil data set was loaded in R and the descriptive statistics were calculated (Table 1), along with the results of the proposed fuzzy model.The defuzzified results of the fuzzy inference system (i.e. the proposed fuzzy model) presented a mean of 32.67 and minimum and maximum values of 16.75 and 82.40, respectively.Kaufmann et al. (2009) adopted a value of 0.5 to state that there are significant differences between their results (where those varied between 1 and 5).Assuming a similar approach and applying a value of 10 to differentiate the results between them, we can say that it was possible to differentiate the scores of the soils.Also, to assess the defuzzified output, the range of the quality index was divided by three (expressing the three proposed fuzzy outputs), where the first interval (0 to 33) refers to a low quality soil; the second interval relates to an average quality soil (33 to 66); and the third interval refers to a good quality soil (66 to 100) for land reclamation's soil construction.
In general, it is observed that 61.5% of the evaluated soils are not suitable for soil construction (defuzzi-fied value is lower than 33), especially considering the low levels of nutrients and high acidity indexes in the soils of Santa Catarina (SBCS, 2004).Only 6.0% achieved a good score (defuzzified value higher than 66), corroborating the previously mentioned considerations.Even so, another 32.5% are acceptable (defuzzified value between 33 and 66) and can be used for land reclamation, bearing in mind possible costs on agricultural inputs.Analyzing the presented mean values, a gradual reduction in the clay content is verified when the quality index increases, since higher clay contents can hinder the mechanical incorporation of the agricultural inputs, besides being more resistant to changes due to its soil buffer potential (MALAVOLTA, 1981).
Although there are not many sandy soils in the data set used, they are also detrimental to land reclamation because they retain few nutrients and humidity, impacting the establishment of vegetation on the site.
An inverse behavior was observed with the mean values of the cation ex-change capacity of the clay fraction, which is related to the soil's clay content; however, it was verified that its increase was correlated with the increase of the quality index.Higher values of cation exchange capacity of clay fraction contribute to higher availability of active sites, which allows greater fixation of nutrients and their release over time (TOMÉ JUNIOR, 1997).
Regarding the mean values of pH, due to the lack of soils with alkaline characteristics, it is noticed that the quality index increases with pH close to 6, while its reduction also causes the reduction of the quality index.Those optimum values (close to 6.5) are those where there is greater availability of nutrients in the soil for the development of plants ( MALA-VOLTA, 1981).
In opposition to what was seen in the pH parameter, higher aluminum saturation values led to lower quality values for the sampled soil.It is emphasized that higher values of pH and low values of aluminum saturation results in lower soil acid neutralizer applications for soil liming, reducing costs in the land reclamation.
As for the mean values of phosphorus content, they are positively related to the values of the quality index, and higher values promote lower use of agricultural inputs, such as fertilizers.

Correlation with reference models
The quality indexes obtained by the fuzzy inference system captured the same trends as the reference models, such as the models of Amacher et al. (2007) and Baggio ( 2010), although with different determination coefficients (Figure 1).et al. (2007), it is noted that there is a better fit of the data, resulting in an R2 of 0.5250.
The differentiation between models is due to the fact that the methodologies adopted by the previously mentioned authors are based on a weighted sum of points and crisp intervals.In the proposed fuzzy logic model, any variation leads to changes in the output values (i.e.quality index), while using crisp intervals, if the change occurs within a range of a class interval, there will be no modification of the output value.

Comparison with expert's knowledge
The comparison of the fuzzy inference system with expert's opinion was performed with 30 samples (taken from previously used dataset and from the authors' collection).Of the 14 submitted forms, only 7 experts answered them.
The mean Kappa coefficient obtained, when comparing each expert's opinion, was 0.40 ± 0.17.The Fleiss' Kappa coefficient, which compares all opinions together, was 0.30.Adopting the Winner-Takes-All strategy (where majority opinion prevails), we have a Kappa coefficient of 0.62; where four samples were discarded, because there was no consensus.
The results of the Kappa coefficients vary between significant (0.61 to 0.80) and reasonable (0.21-0.40), and among all the experts (Fleiss' Kappa), the results were also reasonable (LAN-DIS AND KOCH, 1977).Some specific situations were obtained where there is almost no agreement between the experts, e.g. one of the experts frequently presented low agreement values (the smaller value is equal to 0.01) when compared to the others.
This happens due to the intermediate values (average class) where this class represents a greater divergence of opinion.When the classification involves extreme values (i.e."Suitable" and "Not Suitable"), the experts' agreement was higher.
These results have already been observed by authors of other fields, such as Perroca and Gaidzinski (2003) in the diagnosis of disease stages, being that the most serious and milder cases were classified with greater ease.Scardi et al. (2008) has assessed the ecological quality of rivers, where the most extreme situations (e.g.High and Low) were classified with greater accuracy and intermediate situations presented more errors.
The professional background of the experts, as well as their working region, also impacts in the moment of their classifications, since the characteristics of the soil are heterogeneous, it influences what the expert considers fertile (good quality) and infertile (bad quality).
A confusion matrix was constructed using a Winner-Takes-All approach and the accuracy, precision, sensitivity and F-Score parameters were calculated, according to Sokolova and Lapalme (2009) and Powers (2011).
The results of the confusion matrix demonstrate that the fuzzy inference system correctly classifies situations where there are substrates with suitable characteristics for land reclamation, since all the evaluation indexes are equal to 1.However, a similar situation did not occur with the "Average" and "Not Suitable" classes.The fuzzy inference system scored an accuracy of 0.81 for the "Average" and "Not Suitable" classes.In other words, for these two outputs, it can correctly classify 81% of the time.As for precision, the two extreme outputs ("Suitable" and "Not Suitable") were correctly indicated all the time (value equal to 1).However, in the intermediate class ("Average"), this situation occurs 29% of the time.On one hand, the system is sensitive for the output values "Suitable" and "Average" (index is equal to 1); on the other hand, for the "Not Suitable" output, it has incorrectly interpreted some classifications (index = 0.75).
The parameter Specificity had good results for the "Suitable" and "Not Suitable" classes, while the intermediate class (i.e."Average") has chances that the system will classify it as another class (specificity = 0.79).
Results from the F-Score parameter ranged from 0.44 ("Average" class) to 1.00 ("Suitable" class), while Matthews Correlation Coefficient (CCM) ranged from 0.48 ("Average" class) to 1.00 ("Suitable" class).These results demonstrate that the proposed system correctly classifies the situations where there are suitable substrates for land reclamation.However, there were difficultness to the system in defining the "Average" output, incorrectly classifying it as "Not Suitable".
Although it is necessary to improve the system, the erroneous classification between "Not Suitable" and "Average" does not compromise the decision making in substrate selection in the land reclamation activities, because in both situations, the input of fertilizers and soil amendments would either way be necessary.

Conclusions
Fuzzy logic has been highlighted in several fields by its satisfactory results.Also, its application in decision making processes and soil science is well established.In the land reclamation field, it is necessary to select the best substrate for soil construction.A fuzzy logic model was proposed to evaluate the best substrates to be used in the land reclamation process.
The explored model was able to capture the trends of weighted sum-based models.However, different determination coefficients were obtained, due to the use of crisp intervals in the reference models.
The results were also compared to the experts' knowledge, where agreement coefficients between satisfactory and significant were obtained.However, a disagreement was observed among the experts themselves, and improvements should be done in future researches in order to obtain greater agreement among them (e.g.adopting the Delphi method or increasing the number of interview rounds).
From the sample data set used, it was possible to select the most appropriate substrate for soil construction.In spite of that, there were difficulties in classifying the intermediate situations ("Average" class).This fact was also observed when comparing the opinion of the specialists.Agreement was easier when it came to classifying substrates in extreme classes (i.e."Suitable" or "Not Suitable").
The proposed fuzzy logic model facilitates the decision making process, aiming to model experts' knowledge.Additional studies can be performed with different parameters and other techniques of artificial intelligence, like neural networks to try to improve the results.

Figure 1
Figure 1 Comparison between reference soil quality indexes and the proposed model.
Table 2 presents the mean values of each score interval.