Estimation of elemental composition of agricultural soils from West Azerbaijan, Iran, using mid-infrared spectral models modelos espectrais de infravermelho

: Characterizing the elemental composition provides useful information about the weathering degree of soils. In Miandoab County, Northern Iran, this characterization was missing, and thus the objectives of this work were to evaluate the weathering degrees for the most typical soils in the area from their elemental compositions, and to estimate this elemental composition using Fourier transform infrared spectroscopy and Random Forest models. Five soil profiles, including Aridisols and Inceptisols, were selected as the most representative of the area. Major elemental oxides were determined in each genetic horizon by X-ray fluorescence, showing that these soils were at early developmental stages. Only Al 2 O 3 and CaO were accurately estimated, with R 2 values of 0.8, and out-of-bag mean square errors of 0.2 and 1.1, respectively. The other oxides were not predicted satisfactorily, probably due to small differences in their elemental compositions. Random Forest provided the important spectral bands related to the content of each element. For Al 2 O 3 , these bands were between 500 and 650 cm -1 , which represent out-of-plane OH bending vibrations and Al–O gibbsite and alumino-silicate vibrations. For CaO, the most important bands are related to carbonate content. A combination of Fourier transform infrared spectra and Random Forest models can be used as a rapid and low-cost technique to estimate the elemental composition of arid and semi-arid soils of Northern Iran.


Introduction
Weathering is defined as the alteration of parent materials by physical, chemical and biotic processes (Osat et al., 2016). The most important processes are those related to the chemical transformation of existing minerals, which leads to their depletion (e.g., by hydrolysis, oxidation, hydration or dissolution) and the formation of secondary minerals and hydrous oxides (Jeleńska et al., 2008).
The determination of elemental composition is commonly used to assess the weathering intensity of soils, and it is usually performed using X-ray fluorescence (XRF) spectroscopy. However, this technique is expensive and the sample preparation is time-consuming.
An alternative is to predict the elemental composition from information obtained by other spectroscopic methods like Fourier transform infrared (FTIR), near infrared (NIR) or Raman spectroscopy, which allow spectral measurements of large numbers of samples at a lower cost and with minimal sample processing (Robertson et al., 2016).
The combination of spectral data and multivariate analysis techniques has gained popularity for quantifying soil properties (Rial et al., 2016). These include statistical methods such as multiple linear regression, principal component regression, partial least squares regression (PLSR), and Random Forest (RF). There are some studies relating FTIR spectra and elemental soil compositions using PLSR (e.g., Mohanty et al., 2016), but the potential use of RF for this purpose has not been analysed.
In the last years, several researchers have described the mineralogy and properties of the soils in Northern Iran (Moradi et al., 2012;Saraskanroud et al., 2017). However, the study of their elemental composition has not yet been done. Thus, the main objective of this study was to investigate the composition of the most representative soils in Miandoab County (West Azerbaijan, Northern Iran) and to predict their elemental composition using a combination of FTIR spectroscopy and RF.

Material and Methods
The studied area is located in Miandoab County, West Azerbaijan, northern Iran (36° 55' 37° 0' N and 46° 0' 46° 10' E), at an altitude that ranges from 1283 to 1308 m. Soil moisture and temperature regimes are aridic and mesic, respectively. The mean annual rainfall and evaporation are 272.3 and 753.3 mm, respectively, with a mean annual air temperature of 12.8 °C. The main land use in this area is agriculture, and the dominant crops are fruits and nuts such as apples, grapes and walnuts.
A number of 17 soil profiles were characterized in order to study the soils in the area. As a result, the following taxons were identified according to the USDA soil classification system (Soil Survey Staff, 2014): two soil orders (Aridisol and Inceptisol), four suborders (Calcids, Cambids, Argids and Xerepts), four great groups (Haplocalcids, Haplocambids, Haplargids and Haploxerepts) and four subgroups (Typic Haplocalcids, Typic Haplocambids, Typic Haplargids and Typic Haploxerepts).
In each profile, genetic soil horizons were identified and described, and composite samples of each horizon were obtained. In total, the number of samples collected was 44. The samples were transported to the laboratory, air-dried and ground to pass through a 2-mm mesh sieve and analysed for general soil properties.
Physical and chemical analyses of the fine-earth fraction were performed using standard methods. These included particle size distribution by the hydrometer method (Gee & Bauder, 1986), organic carbon content using a Flash EA 1112 elemental analyser (ThermoQuest, USA) after carbonate removal with HCl, calcium carbonate equivalent (CCE) using a titration method (Nelson & Sommers, 1996) and cation exchange capacity (CEC) by the Bower method (Bower et al., 1952).
Saturated paste extracts were prepared to measure soil pH using a pH meter (AZ-86502), electrical conductivity (EC) using an EC meter (AZ-8301), and soluble cations and anions by atomic absorption spectrophotometry using a Shimadzu AA-670 spectrophotometer (Shimadzu, Japan). The sodium adsorption ratio (SAR) was then calculated using Eq. 1 (Soil Survey Staff, 2014).

SAR Na
Ca Mg The 44 air-dried samples were ground in an Agatha mortar, and 1 mg of soil was combined with 200 mg of potassium bromide, mixed and ground together before being pressed to produce a pellet. Absorbance of the pellets was measured in the MIR region (wavenumber range from 400 to 4000 cm -1 , with a resolution of 2 cm -1 ) using a FTIR Bruker Vector 22 spectrophotometer (Bruker, USA) in transmission mode. The obtained spectra were baseline corrected and normalized using the extended multiplicative signal correction method (EMSC) (Kohler et al., 2005).
Based on the results of the physical and chemical analyses of all samples, five profiles were selected as the most representative of the study area, which included a total number of 19 samples. Total elemental compositions were determined by X-ray flourescence (XRF) for these samples using a S4 Pioneer XRF spectrometer (Bruker, USA).
Random Forest (RF) was used to find relationships between elemental composition as dependent variables and FTIR spectra as predictor variables, as well as to identify the spectral bands most relevant to elemental composition. Random Forest is a machine learning algorithm based on the calculation of multiple randomized regression trees (e.g., 500 to 2000 trees). It is becoming very popular because of its good modelling characteristics: 1) it is a non-parametric method; 2) it can be applied to datasets with a low number of observations and many predictors; 3) it can be applied to two and multi-class problems; 4) it does not overfit; 5) it ranks the most important variables for prediction and 6) it has good performance, low bias, low variance, and has low correlations between the classification trees (Wiesmeier et al., 2011).
For the construction of the RF model, the original data (n = 19) was split through bootstrapping into two sets: a training set that included 75% of the data, which was used to fit the model and a test set that comprised the remaining 25%, which was (1) used for validation of the fitted model. The non-sampled data in the training set was replaced until its size was the same as that of the original dataset. Then, a regression tree was constructed with the training set, and the non-sampled data, or "out-of-bag" (OOB) data, was used to estimate the prediction error. This process was repeated 500 times, and the final model parameters were selected averaging the results of all fitted regression trees.
To evaluate the performance of each RF model, the root mean square errors (RMSE), coefficients of determination (R 2 ), and OOB mean square errors (MSE OOB ) were calculated according to the following equations (Eqs. 2, 3, and 4): where: y p -estimated value; SS error -sum of squared errors between estimated and observed values; and, SS total -sum of squared deviations of each response variable from its mean.
All data manipulations and analyses were performed in the R-Studio environment (R Core Team, 2017). Baseline removal and spectral corrections were performed using the EMSC R package (Liland, 2017), and Random Forest models were constructed using the randomForest R package (Liaw & Wiener, 2002).

Results and Discussion
Some of the general properties of the five representative profiles are presented in Table 1. Calcium carbonate contents were high, which was common in both arid and semi-arid areas, and generally increased with depth, most probably because of mobilization from the upper horizons during wet periods and re-deposition in deeper parts of the profiles.
There were higher EC values in P1 and P4 and SAR values in P1, P3, and especially in P4, when compared to the other profiles because of the presence of a shallow water table near the surface. Soil reaction in all horizons ranged from moderately basic to  (2) slightly alkaline, as expected from the presence of CaCO 3 , with pH values between 7.5 and 9.0. The pH increased with depth, except for the transition between the Bk2 and C horizons of P1, because of the accumulation of carbonates in subsurface horizons. Considering their pH, EC, SAR and CaCO 3 contents, P2, P3 and P5 were classified as non-saline-non-alkali, P1 was classified as saline and P4 was classified as saline-alkali.
The amount of organic carbon (OC) was low in all soils (< 20 g kg -1 ), with the highest values being observed in the A horizon of P1. Clay contents were also low, except in P3 and P4. In P4, the translocation of clay from the surface formed argillic horizons that were the diagnostic feature of the soil.
Cation exchange capacity was low (< 25 cmol c kg -1 ), except in P3, A and in general decreased with depth, even in the Haploargid profile, which indicated that the CEC was controlled mainly by the presence of organic matter.
The XRF elemental analyses included 10 elements (Si, Al, Ca, Fe, Mg, K, Na, Ti, P and Mn). SiO 2 was the most abundant oxide, with values varying between 483.0-528.0 g kg -1 , followed by Al 2 O 3 , with values between 128.0 and 159.0 g kg -1 , as well as Ca, Fe and Mg in decreasing order ( Table 2).
The amounts of Al were higher than those that were reported for other soils in the country (Broomandi et al., 2017), which were found to be depleted in this element. The relatively high amounts of Ca and Mg were most likely a result of the presence of carbonates in all horizons of these soils.
The low coefficient of variation of the contents of major elements (Si and Al), which are the building blocks of silicate minerals, suggests that all these soils, despite belonging to different soil orders, had a very homogeneous silicate composition, and that variations with depth should not be expected to be pronounced.
Higher variability was observed for Ca and Mg composition, which was most likely a consequence of carbonate variations among soil profiles and horizons. A relatively high Na variability (Table 2) was also observed, which was attributed to the differences in soil sodicity shown in Table 1.
The FTIR spectra of the 19 reference samples after baseline and EMSC corrections are presented in Figure 1. In this figure, the spectra are shifted along the y-axis to avoid overlapping for illustrative purposes. Compared to other spectral regions, such as Vis-NIR, mid-infrared is considered better for the identification of inorganic and organic functional groups in soils, and therefore of soil constituents, due to their distinctive peaks (Hunt & Salisbury, 1970).
The spectra showed the characteristic bands of carbonates, as expected from the values for the general soil properties: a broad band around 1400-1500 cm -1 with a peak at approximately 1440 cm -1 , and two smaller bands at 875 and 712 cm -1 (Müller et al., 2014). This last band indicated the presence of calcite instead of dolomite.
The bands between 3600-4000 and 1000-1200 cm -1 , were mainly caused by silicates. The peak at 1030 cm -1 corresponded to the Si-O-Si stretch vibration typical of these minerals (Madejová, 2003). However, the identification of silicates in a complex mixture, such as soil, is not straightforward, since characteristic peaks of different minerals often overlap.
Nevertheless, the presence of peaks at 3692 and 3627 cm -1 are characteristic of 1:1 clays, which together with the absence of the typical combination of kaolinite peaks in the 900-1100 cm -1 region, indicated that the dominant clay in these samples was likely halloysite (Joussein et al., 2005).  Table 2. Maximums, minimums, means, standard deviations and coefficients of variation (CV) of the elemental compositions measured by XRF The band at 3682 cm -1 was also characteristic of smectites, which are typical in arid and semi-arid soils, and thus very likely to be present in these samples. In addition, bands at 2850 and 2920 cm -1 were characteristic of C-H aliphatic chain vibrations, which are well known to be related to organic matter in the soil (Calderón et al., 2011).
The interpretation of bands below 900 cm -1 is complicated, especially in mixtures, such as soils, since the absorbances of many organic and inorganic compounds overlap in this region (Calderón et al., 2011). However, some of the peaks described in the literature, such as those at 800 and 780 cm -1 , have been identified as symmetric stretching vibrations of Si-O bonds, typical of silica and quartz (Müller et al., 2014).
Using the spectra from Figure 2 as independent variables, RF models were adjusted to the major oxides described in Table 2. The R 2 , MSE and RMSE for each oxide are presented in Table 3. Only Al 2 O 3 and CaO were accurately estimated, with R 2 values of 0.8 and MSE of 0.2 and 1.1, respectively. TiO 2 was predicted with moderate accuracy, but all of the remaining oxides, including MgO, SiO 2 , P 2 O 5 , MnO, Na 2 O, K 2 O and Fe 2 O 3 , were poorly predicted.
These results could be considered less satisfactory than those that were found by other authors, like Mohanty et al. (2016), who had an R 2 of 0.9 when modelling oxide contents in soils from different regions of India using mid-infrared data and PLSR.
Possible reasons for the weaker performance of the models in this study include the limited amount of samples compared to those of Mohanty et al. (2016). However, PLSR is a statistical method that requires a large number of samples for modelling, and thus its use was not possible for our dataset. Another reason was the narrow range of values for most oxides, which is well known to be a limitation for statistical models. In any case, RF was probably the best option for modelling element contents despite its limited performance.
The oxides contents that were satisfactorily modelled (Al 2 O 3 and CaO) were predicted using the RF models in the remaining 48 soil samples. The results for the 17 sampled profiles, both measured and predicted, are presented in Figure 2.
Weathering processes in the soils lead to the depletion of elements that are considered mobile, such as Si, Ca or Mg, and an enrichment of those that are not mobile, such as Al, Ti, or Fe. Thus, the comparison of the behaviour of these elements can be used as an indication of the weathering and leaching of the soil (Bahlburg & Dobrzinski, 2011).
The CaO content gradually increased with depth in the typic Haplocalcids ( Figure 2D), as expected from the increase of carbonate content and formation of calcic horizons. The formation of these horizons is known to be caused by the release of Ca from the primary minerals and subsequent leaching before precipitating as carbonates. Yousefifard et al. (2012) found that in a semi-arid region of Iran, the percentage of Ca released from primary minerals that precipitates as secondary calcium carbonate to form calcic horizons can reach up to 70-90%. The values of CaO also increased in some profiles of the typic Haplocambids ( Figure 2B); although in this case, the migration of CaO was not marked enough to develop calcic horizons yet.
On the contrary, there was no obvious trend in CaO distribution for some profiles of the typic Haploxerepts (Figure   (Figure 2A). In the typic Haloargids, the most prominent feature is the decrease of CaO from the A horizon to the Bt horizon. In general, CaO values are positively correlated with soil leaching (Wang et al., 2002), and thus it seems that leaching is a more important process in Haplocalcids and Haplocambids, while it is not so relevant in Haploargids and Haploxererts. In a weathering environment controlled by leaching, a consistent decrease of Al with depth could also be expected. Figure 2E, F, G and H shows that, for each profile, the differences in Al composition among the horizons are not large, and a consistent decrease of Al 2 O 3 with depth was only observed in the Haplocambids. The results suggest that these soils were at an early stage of development and weathering was not intense enough to produce significant Al variations with depth.
In fact, in the Haploxererts, and especially in the Haplargids, a decrease in the Al content was observed with the transition from the A horizon to the Bw (Haplocambids) or Bt1 (Haplargids) horizons. This decrease was most likely caused by an increase in clay content with depth, since Al 2 O 3 contents in phyllosilicate clay minerals are higher than those that have been found in primary aluminosilicates such as feldspars (Muhs, 2001).
In summary, and based on the distribution of elements in the soil profiles, soils in Miandoab County are young, and some leaching was taking place in the profiles, mobilizing the most soluble elements (Ca), but they were an early stage of development and did not have obvious differences in less mobile elements (Al).
In addition to the goodness of fit parameters, RF provided the most relevant wavenumbers for the prediction of each element. Those wavenumbers are presented in Figure 3 for those indices that showed a good fit when modelling Al 2 O 3 , and CaO. The most relevant bands for the prediction of Al 2 O 3 were 595 cm -1 , followed by 561 and 557 cm -1 . In fact, 9 of the 10 most important bands are in the region between 500 and 650 cm -1 , which have been ascribed to a combination of out-ofplane OH bending vibrations and gibbsite and alumino-silicate Al-O vibrations (Ali & Padmanabhan, 2017).
In the case of the CaO (Figure 3B), the interpretation of the important spectral bands was more complex. The 10 most important ones included those at 865, 883, 869 and 867 cm -1 , which were part of a broad peak with a maximum at 875 cm -1 , characteristic of carbonates. This band has been found useful for carbonates quantification by other authors (Tatzber et al., 2007).
The bands at 3093 and 3095 cm -1 could also be related to carbonates have also been ascribed to carbonates, but their meaning is uncertain. The bands at 1029 and 1593 cm -1 were related to silicates, and this could be derived from the presence of Ca in the exchangeable complex adsorbed to the clay surfaces.
Thus, for Al and Ca, the combination of FTIR spectra and RF not only allowed the creation of models that can be used to predict these elements contents in other soil samples of the area by simply measuring their FTIR spectra, but it also allowed the MSE -Mean square error relation of the amounts and variability of these two elements with the soil components where they are more abundant: carbonates in the case of Ca, and silicates in the case of Al.

Conclusions
1. The Fourier transform infrared spectra reflected the composition of these soils, with distinctive signatures, including the presence of calcite and low activity clays, mainly halloysite.
2. The combination of Fourier transform infrared spectra and Random Forest models provided adequate predictions of two of the analysed oxides, Al 2 O 3 and CaO. The important spectral bands for the prediction of these oxides were related to sounding functional groups and soil components.
3. The interpretation of the measured and predicted profiles indicated that the soils in the Miandoab County, West Azerbaijan, Iran, are at an early stage of development. In each profile, the elemental composition was very similar for all horizons, and a consistent enrichment of CaO with depth was only observed in the Typic Haplocalcid.

Acknowledgments
The XRF and FTIR spectroscopy analyses and part of the soil analysis were performed during a sabbatical leave by the first author at the University of A Coruna. The authors are deeply grateful to the faculty and staff members of the University of A Coruna for all the help and assistance. Financial Wavenumber (cm -1 )