Multivariate statistical analysis applied to the evaluation of groundwater quality in the central-southern portion of the state of Bahia-Brazil

The objective of this study was to identify and evaluate the variables responsible for contributing to possible natural and/or human contamination in groundwater of the semiarid region of the state of Bahia, seeking to subsidize water quality monitoring and management actions in the area. To do so, multivariate analysis techniques regarding factorial analysis in principal components and cluster analysis were used. The factorial analysis allowed the grouping of variables into two principal factors that explained 93% of total accumulated variance. Variables were strongly related to concentrations of metals and salinity in the water. The cluster analysis was used to classify water sources according to the quality of waters into three clusters in each factor. The natural background of the rocks of the municipality of Boquira was shown to influence water resources. A continuous (during dry and rainy seasons) monitoring of water quality from wells and springs located upstream and downstream from contamination sources is recommended, even if these waters are not used for public supply, to determine possible contamination plumes from contaminated material.


INTRODUCTION
The municipality of Boquira is located in the central-southern portion of the state of Bahia, within the semiarid region of the state, under the geological domain of the Paramirim deformation corridor (Alkimim et al., 1993). Among other lithotypes, this corridor comprises rocks of the Boquira Unit, which is widely known for lead-zinc mineralization, providing this region with a high natural background for these metals. In fact, the largest lead and zinc mine in Brazil was located in this area. It operated for over 30 years (1960 to 1992) until it was suddenly abandoned, leaving significant environmental liability in the form of particulate material composed of toxic metals, such as lead, zinc, silver, barium, copper, chromium, nickel, arsenic, cadmium, among others, deposited in the tailings pond and galleries of the underground mine (Daltro, 2017).
Although mineral extraction has been completely interrupted for over 20 years in the area, the recovery plan for degraded areas was never implemented. Even with the imminent risk of contamination, the municipal landfill was installed on the surface of the pile of tailings. In addition, the presence of urban districts, recyclable material pickers in the landfill, and rural establishments near the place where the pile of tailings is located may result in a public health issue, therefore demonstrating a situation of environmental injustice (Andrade et al., 2017).
During the past years, several studies have been conducted with the objective of evaluating possible environmental impacts regarding an increase in the concentration of heavy metals in the environment. These metals originate from lithogenic processes and/or human activities, such as the use of fertilizers in agricultural zones and mining activities (Muniz and Oliveira Filho, 2006).
Studies on the evaluation of groundwater quality generally use several variables, which, in turn, are strongly correlated, thus hampering the understanding of their interrelationships. The use of multivariate analysis techniques allows a reduction in the number of variables, definition of their relationships, identification of those that are responsible for the dispersion of observations, and classification of clusters (Brito et al., 2006).
The multivariate analysis helps in the definition of which variables are more important for water management, assisting in the selection of variables using more objective criteria. This type of statistical analysis is an effective tool in the qualitative evaluation of waters (Vidal and Kiang, 2002;Brito et al., 2006;Cloutier et al., 2008;Palácio, 2009;Fernandes et al., 2010;González et al., 2011;Finkler et al., 2015;Gomes and Cavalcante, 2017). Therefore, multivariate techniques were used (factorial analysis by the principal component analysis method and hierarchical clustering analysis) with the objective of evaluating which variables (metals) are most important in the contribution to possible natural and/or human contamination of groundwater, aiming to subsidize groundwater quality monitoring and management actions in the semiarid region of the state of Bahia.
Rev. Ambient. Água vol. 15 n. 1, e2408 -Taubaté 2020 Two multivariate analysis techniques were used in this study: factorial and hierarchical clustering analyses, which were processed using the software SPSS Statistics, version 17.0.
The factorial analysis describes the correlation or covariance between a set of variables regarding a limited number of non-observable variables. These non-observable variables or factors are calculated by the linear combination of the original variables.
The factorial analysis was based on three phases: calculation of the correlation matrix between variables (R-mode factor analysis), extraction of initial factors and rotation of the matrix.
The correlation matrix was calculated based on the KMO index (Kaiser-Meyer-Olkin Measure of Sampling Adequacy), which determines the adequacy of the factor analysis data and Bartlett's test of sphericity, which tests the null hypothesis that the variables analyzed are not correlated (Hair Jr. et al., 1998). The method of Principal Component Analysis (PCA) was used for the extraction of the factors of the correlation matrix and the rotation of the factors was accomplished by the varimax method (orthogonal rotation was the most used and as a characteristic minimizes the number of variables with high loads in different factors, allowing the association of a variable to a single factor). This procedure aims to describe the relationships of covariance among correlated parameters, based on identified factors, and to observe through communalities how much each parameter explains each factor (Hoffmann, 1992;Manly, 1998;Landim, 2011).
Lastly, the hierarchical clustering of samples (Q-mode factor analysis) was conducted using the highest number of variables explained by a single factor in the factorial analysis. In this technique, Ward's method was used as the hierarchical clustering criterion, measuring the similarity given by the squared Euclidean distance. This clustering criterion uses the total sum of the squared values of each object's deviations in relation to the mean value of the group where it was inserted. This criterion was chosen based on its frequent application in studies on water quality (Vega et al., 1998;Andrade et al., 2008;Fernandes et al., 2010;Salgado et al., 2011;Gomes and Cavalcante, 2017).
The evaluation of groundwater adequacy for human consumption was conducted using the clusters formed primarily based on Ordinance No. 5 of 28/09/2017 of the Ministry of Health (Brasil, 2017)

RESULTS AND DISCUSSION
The factorial analysis by the principal component method was initially performed with 30 physicochemical variables (Al, As, B, Ba, Be, Ca, Cd, Co, Cr, Cu, Fe, Hg, K, Li, Mg, Mn, Mo, Na, Ni, Pb, Sb, Si, Sn, Sr, Ti, V, Zn, pH, EC and DO) analyzed in July 2013. Five simulations were necessary to achieve a satisfactory result, taking into account the criteria adopted for this analysis, significantly reducing the number of variables in the last simulation. The final simulation resulted in 9 variables (B, Ba, K, Mg, Na, Si, Sr, Ca and EC) and presented two factors that adequately described the variation of the data (Tables 1 and 2), with KMO value 0.600.
The correlation analysis applied to the variables of quality of groundwater samples showed that most of them were strongly correlated (Table 1) with high statistical significance (p ≤ 0.01).
Rev. Ambient. Água vol. 15 n. 1, e2408 -Taubaté 2020  The principal component factorial analysis applied in groundwater samples condensed the variables analyzed into two ordered factors, explaining 93% of total variance. Factor 1 (F1) alone was responsible for 75% of this variance. The variables with the highest factorial charges, in this factor, were B (0.919), Na (0.921), and Mg (0.897), although the remaining ones also presented a strong relationship, given the high values of factorial charges and final communalities observed. On the other hand, Factor 2 (F2) was responsible for 18% of total variance and also included variables with high factorial charges, such as Ba (0.971) and Sr (0.954) ( Table 2). According to Andrade (1989, as cited by Brito et al., 2006, variables whose factorial charges are high are considered representative and must always be above 0.300. Factor 1 (75% of data variance), represented by the variables boron (B), potassium (K), magnesium (Mg), sodium (Na), and electrical conductivity (EC), was strongly related with the concentrations of metals and salinity in the water. On the other hand, Factor 2 (18% of data variance), which comprised the variables barium (Ba), silicon (Si), strontium (Sr), and calcium (Ca), was strongly correlated with concentrations of metals in the water (Table 2). Similar results were found by Celino and Rangel (2007).
Soil weathering and lixiviation are examples of natural processes that trigger the appearance of heavy metals in waters and soils. However, metal extraction and processing, industrial tailings, domestic sewage, agricultural inputs, disposal of commercial products, burning of fossil fuel, and disposal of sewage sludge are human activities associated with environmental contamination by these metals (Nriagu and Pacyna, 1988;Teixeira et al., 2000;Alleoni et al., 2005;Guilherme et al., 2005 according to Muniz andOliveira Filho, 2006). Alves et al. (2017) observed that the main lead-carrying minerals are found in advanced stages of weathering, eventually altering into lead oxides. The inadequate disposal of this type of tailing may result in favorable conditions for lead release from the structure of the carrying minerals, and consequently threatening the environment and population that lives in the surrounding areas of the tailings pond.
The hierarchical clustering analysis applied to the quality of groundwater data allowed the classification of water sources into three clusters with chemically similar characteristics in each factor (Figures 2 and 3).
The number of clusters was defined based on the first large difference among re-scaled clustering coefficients. These coefficients revealed cutoff point 2 (higher precision), where the formation of three homogeneous clusters was observed for the sampling period in each factor.  According to the variables of Factor 1 (B, K, Mg, Na, EC) (Figure 2A), three similar clusters were generated, comprising 55% (3 wells and 3 natural springs), 18% (2 wells), and 27% (3 wells) of the samples analyzed in clusters 1, 2, and 3, respectively ( Figure 3A).
Cluster 1 (NS1, NS2, NS3, TW4, TW5, TW6) was characterized by waters with low concentrations of metals and salinity. In this group, boron (B) concentration ranged between 0.01 and 0.02 mg L -1 , which means that all samples were below the  (2011) do not define maximum permissible values for the concentration of K and for EC. These results were similar to those found in the study by Daltro (2017).
Cluster 2 (TW3, TW8) was characterized by waters with intermediate concentrations of metals and salinity, presenting a concentration of boron (B) between 0.04 and 0.10 mg L -1 , which is below the maximum permissible value according to the CONAMA Resolution No. 396/2008 (0.5 mg/L -1 ). Magnesium (Mg) ranged between 43.00 and 113.00 mg L -1 , and one sample (TW3 -Boquira Unit) was above the maximum permissible value according to WHO (2011), which is 50 mg L -1 . Sodium (Na) varied between 77.20 and 86.90 mg L -1 , therefore being within the permissible limit according to Ordinance No. 5 of 28/09/2017 of the Ministry of Health and the CONAMA Resolution No. 396/2008 (200 mg/L -1 ). Potassium (K) varied between 9.57 and 12.50 mg L -1 , and EC was between 1,335 and 1,357 μS/cm. Again, ordinance No. 5 of 28/09/2017 of the Ministry of Health, the CONAMA Resolutions Nos. 396/2008396/ and 357/2005396/ , and WHO (2011 do not define maximum permissible values for the concentration of K and for EC. These results also corroborate those found by Daltro (2017).
Cluster 1 (NS1, NS2, NS3, TW5, TW6) of Factor 2 was represented by the same samples of Cluster 1 of Factor 1, except for sample TW4, which was characterized by good-quality waters, presenting a concentration of barium (Ba) that ranged between 0.01 and 0.05 mg L -1 .
Rev. Ambient. Água vol. 15 n. 1, e2408 -Taubaté 2020 All samples were within the maximum permissible value (0.7 mg/L -1 ) for this element according to Ordinance No. 5 of 28/09/2017 of the Ministry of Health and the CONAMA Resolution No. 396/2008. The concentration of calcium (Ca) ranged between 0.98 and 2.41 mg L -1 , and was also within the maximum permissible limit according to WHO (2011), which is 75 mg/L -1 . Ordinance No. 5 of 28/09/2017 of the Ministry of Health and the CONAMA Resolutions Nos. 396/2008 and 357/2005 do not define a maximum permissible value for the concentration of calcium. Results regarding silicon (Si) and strontium (Sr) ranged between 6.88 and 11.50 mg L -1 , and between 0.01 and 0.07 mg L -1 , respectively. Ordinance No. 5 of 28/09/2017, the CONAMA Resolutions No. 396/2008and 357/2005, and WHO (2011 do not define maximum permissible values for the concentrations of Si and Sr. Similar results were also presented in the study by Daltro (2017).
Cluster 2 (TW4) of Factor 2 was characterized by lower quality waters compared with Cluster 1, but still higher quality than in Cluster 3. In Cluster 2, barium (Ba) concentration was 0.02 mg L -1 , which means this sample was within the maximum permissible value (0.7 mg/L -1 ) according to Ordinance No. 5 of 28/09/2017 of the Ministry of Health (MH) and the CONAMA Resolution No. 396/2008. In turn, the concentration of calcium (Ca) was 91.10 mg L -1 , therefore above the permissible limit according to WHO (2011) Cluster 3 (TW1, TW2, TW3, TW7, TW8) of Factor 2 was characterized by lower quality water compared with Clusters 1 and 2, presenting concentrations of barium (Ba) that ranged between 0.05 and 1.26 mg L -1 . Two samples (TW7 -Boquira Granite and TW8 -Serra do Espinhaço) were above the maximum permissible value (0.7 mg/L -1 ) for this element according to Ordinance No. 5 of 28/09/2017 of the Ministry of Health and the CONAMA Resolution No. 396/2008. The concentration of calcium (Ca) varied between 202.00 and 303.00 mg L -1 , which means all samples were above the permissible limit according to WHO (2011), which is 75 mg/L -1 . Silicon (Si) and strontium (Sr) varied between 30.40 and 52.40 mg L -1 , and between 0.45 and 2.50 mg L -1 , respectively. As previously mentioned, Ordinance No. 5 of 28/09/2017 of the Ministry of Health, the CONAMA Resolutions Nos. 396/2008and 357/2005, and WHO (2011 do not define maximum permissible values for the concentrations of Si and Sr. The variables of Factor 1, which are indicators for the concentration of heavy metals and salinity in waters, suggest that the water exploited from wells and natural springs that were identified as belonging to Cluster 1 meet drinking water standards regarding the concentration of B, Mg, and Na ions. These wells and springs capture water from the Boquira Unit and Serra do Espinhaço. Only two wells (TW3 and TW8) were identified as belonging to Cluster 2. They also exploit water from the Boquira Unit and Serra do Espinhaço, but in these wells, waters are more mineralized (higher EC) than those of the previous cluster. In addition, the TW3 well presented Mg concentrations above the maximum permissible value. In turn, the wells and natural springs of Cluster 3, which exploit waters from the Boquira Granite, presented highly mineralized waters, with high concentrations of metals. These waters did not meet drinking water standards regarding the concentration of Mg and Na ions. In addition, two wells (TW1 and TW2) are located in the surroundings of the tailings pond of the mine.
The variables of Factor 2, which are indicators of concentrations of metals in waters, suggest that the wells and natural springs identified as belonging to Cluster 1 exploit better quality waters in relation to Ba, Si, Sr, and Ca ions, geologically located in the Boquira Unit and Serra do Espinhaço. The well identified as belonging to Cluster 2 (TW4), which also Rev. Ambient. Água vol. 15 n. 1, e2408 -Taubaté 2020 exploits water from the Boquira Unit, presented lower quality waters than Cluster 1, though higher than Cluster 3, with concentrations of calcium above drinking water standards. In turn, wells and natural springs of Cluster 3 presented lower quality waters (not meeting drinking water standards regarding the concentration of Ba and Ca ions) than the previous groups, where waters are captured from the Boquira Granite, Boquira Unit, and Serra do Espinhaço. These are highly mineralized waters, with high concentrations of metals. Cunha et al. (2016) analyzed the mines' ramp and observed high levels of lead (Pb), zinc (Zn), cadmium (Cd), nickel (Ni), cobalt (Co), strontium (Sr), magnesium (Mg), and calcium (Ca). This may be explained due to the high amount of material from the tailings pond deposited in the galleries of the underground mine during the period it was active and after mining activities were abandoned.
According to Daltro (2017), the groundwater flow in the municipality of Boquira is preferentially oriented W-NE and, specifically in the study area, sampling points were located upstream in relation to the main human contamination sources (tailings pond, galleries of the underground mine, and open pit mine).

CONCLUSIONS
The factorial analysis allowed the classification of the most significant variables for water quality, prioritizing those that were strongly related with concentrations of metals and salinity in waters.
The clustering analysis classified water sources according to the quality of waters, resulting in three clusters in each factor. This allowed the conclusion that the natural background of the rocks of the municipality of Boquira influence water resources, with values found above the maximum permissible limits of metals for human consumption, such as magnesium (Clusters 2 and 3 of Factor 1), calcium (Clusters 2 and 3 of Factor 2), and barium (Cluster 3 of Factor 2).
Although the highest concentrations for metals such as lead, zinc, and cadmium were located in the waters of the underground mine galleries and tailings pond, high contents of these metals were not found in the groundwater samples analyzed.
Continuous monitoring (rainy and dry season) of the quality of water and springs located upstream in relation to contamination sources is recommended, regardless if they are not used for public supply, with the objective to determine possible contamination plumes of contaminated material.
The results of these multivariate analyses are important to support quality monitoring and management of groundwater, especially in regions that present high socio-environmental vulnerability, such as the municipality of Boquira.