Water quality index using multivariate factorial analysis

The evaluation of environmental effects generated by agricultural production on water quality became essential in Brazil after the creation of policies for the use and conservation of water resources. For such, water quality indices have been considered with the purpose of showing the spatial and temporal variation of water quality in a watershed. The objective of this study was to develop a water quality index (WQI) applying the Multivariate Factorial Analysis (MFA) statistical technique, which could indicate the influence of agricultural activities in the quality of water resources. Water in a predominantly farm watershed was monitored from Sept. 2003 to Sept. 2004. Monthly water collections were carried out at six sample points, and eight parameters were analyzed: nitrate, ammoniacal nitrogen, ammonia, total phosphorus, electrical conductivity, pH, suspended solids and turbidity, which were considered important due to the agricultural management adopted in the region. Results indicated a contamination of agricultural origin along the basin. Factorial analysis showed that ammonia, ammoniacal nitrogen and nitrate parameters were the ones that most contributed in determining the WQI.


INTRODUCTION
The reduction in the quantity and quality of water resources has become a worldwide concern, including countries with great water potential such as Brazil, since the availability of water is one of the main factors that limits socioeconomic development.Merten & Minella (2002) observed that the water quality concept is not necessarily a state of purity of the water, but refers to chemical, physical and biological characteristics that determine its different uses.According to Sperling (1996) since the process of qualifying a water resource becomes complex due to the number of parameters involved, water quality indices are proposed with the intent of summarizing the analyzed variables and expressing them in a single number with the objective of showing the temporal and spatial evolution of water quality.
The search for an indicator that best characterizes a water source under study requires the use of statistical techniques.According to Haase et al. (1989), one of the methods used in the formulation of water quality indices is based on the multivariate factorial analysis technique, which was used by Carvalho et al. (2000) to evaluate a watershed water quality.The main objective of this analysis is to study the correlation structure of an initial set of "p" variables (X 1 , X 2 ... X P ), replacing it with a smaller set of hypothetical variables, which, lower in number and with a simpler structure, explain most of the variation in the original variables.
This technique permits learning the behavior of data from the reduction of the parameters' original space dimension, thus permitting the selection of the most representative variables for the water resource being analyzed, as explained by Andrade et al. (2007).Factorial analysis thus permits the definition of more sensitive indicators, which can facilitate a monitoring program as well as the evaluation of changes that have taken place in the water resources.Toledo & Nicolella (2002) demonstrated that the use of the index based on the factorial analysis technique is very useful when one intends to evaluate changes that have taken place in the watershed axis.However, indices based on statistical techniques cannot be generalized for all water sources since the water system has its own peculiar characteristic (Haase et al., 1989).
Irrigated agriculture can modify the quantity and/or the quality of water resources, affecting availability for several purposes.Irrigation is an agricultural technique that is being used more and more to increase crop productivity.According to Paz et al. (2000) the development of irrigated agriculture requires technological and economical procedures to optimize the water use to improve system application efficiency and yield gains based on the crop response to water application and other supplies without compromising the availability and quality of the resource.
The objective of this study is to develop a water quality index (WQI) applying the Multivariate Factorial Analysis (MFA) statistical technique, which could indicate the effects of agricultural activities in the quality of a watershed's water resources.

MATERIAL AND METHODS
The Water Quality Index (WQI) was established using multivariate factorial analysis applied on the water quality data of a watershed, monitored during a 13-month period.

Characterization of the watershed
The Rio das Pedras watershed, located in the cities of Mogi Guaçu and Estiva Gerbi, SP, and which belongs to the Mogi-Guaçu River watershed, was chosen as the study area.This watershed was selected due to the significant activities of tomato crop production in the area, with constant applications of agrochemicals associated with furrow irrigation, known for its low application efficiency and water losses.This combination is considered a potential generator of diffused loads that are harmful to the availability of water resources.Thus, six water collection points were chosen, taking into consideration urban nuclei.Care was taken not to gather samples from regions near city districts since the objective was to evaluate only the agricultural impacts on water quality in the Rio das Pedras Watershed.In order to facilitate the location of the selected points, since the distances between points was relatively large, and to permit future researches to be conducted to expand the database, GPS equipment was used to georeference the sample collection points.Figure 1 shows the hydrological map for the Rio das Pedras Watershed in 2004 and the location of the water sample collection points.

Monitoring the climate
An automated Campbell meteorological station with a CR10X datalogger model was used to gather weather conditions.Its installation in the watershed area was carried out in bare ground conditions and followed the conditions pre-determined by technical standards (EPA, 2000), which recommend a minimum distance from buildings, fences, trees and power lines.Only atmospheric precipitation values (mm h -1 or mm d -1 ) were used in this study despite the potential to record other weather data.

Parameters for monitored water quality and methods of analysis
Water sample collections were carried out monthly in the field using new plastic bottles with 500 mL of total volume, and three samples were always gathered from each point.The samples were kept cold from collection to analysis, as indicated in Standard Methods (APHA, 1995).
The analyzed parameters were: Electrical Conductivity (CE), pH, Ammoniacal Nitrogen (NIA), Ammonia (AMO), Nitrate (NIT), Total Phosphorus (FOT), Suspended Solids (SS) and Turbidity (TU).The laboratory analyses were always performed the day after the collection.Same day analyses were only performed in cases of turbidity and suspended solids; pH and electrical conductivity tests were carried out in the field.For the purpose of assessing the organic contamination of water sources, the dissolved oxygen (DO) was also included as a monitored parameter in the experiment.However, due to technical problems in the equipment, the collected data showed no reliability, determining a discontinuity in its measurement.

Christiane Coletti et al.
It is important to emphasize that the estimation of this parameter is essential in water quality analysis and its inclusion is required in this type of research.
Analyses of suspended solids, turbidity, pH and electrical conductivity parameters were performed according to Standard Methods (APHA, 1995).Colorimetric determinations were used for analyses of nitrate, ammoniacal nitrogen, ammonia and total phosphorus in water using these reagents: nitraver 5, nessler reagent, phosver 3, potassium persulfate; a digester and a spectrophotometer; the latter being a method accepted by USEPA (United States Environmental Protection Agency) to monitor receptor bodies and Wastewater Treatment Systems launches.
where: a jp F pi -contribution of the common p factor to the linear combination; u j Y ji -residual error in the representation of the observed z ij measurement.The model was adjusted using the average of the results from the three simple samples collected from the six points distributed over the watershed, for the eight monitored parameters over the thirteen months of the experiment.
The elaboration of the WQI using this technique required three phases: a) preparation of the correlation matrix; b) extraction of the common factors and the possible reduction of space, and c) the rotation of axes related to the common factors, aiming for a simple and easily interpreted solution.
First, a basic matrix was built without any gaps, a requirement demanded by the factorial technique being employed.Using the matrix from the original data, a correlation matrix was obtained using the Spearman Correlation Coefficient, which revealed the linear dependency among those variables being studied (Steel & Torrie;1980;Bhattacharyya & Johnson, 1977).
Common factors were then found using the procedure called Factor Analysis.Their numbers were determined based on the percent of total variance in the variables.This is explained by the set of factors associated with their representativeness to the actual study situation.
With the intent of obtaining a simpler factorial load matrix, the axes were rotated using the Varimax Procedure.Since the obtained results were not efficient, the axes without rotation were used.

RESULTS AND DISCUSSION
Table 1 shows the basic descriptive statistics for the variables being studied at the six sampling points during the 13 months of collection, between September 2003 and September 2004.
In the Table 1, it is possible to observe the high values for the coefficient of variation shown by the variables, demonstrating the variability during the sampling period.This can be explained by the anthropic interventions that occur in the area along the watershed and by the variation in precipitation and flow during the study period.Table 1 also shows the proximity between the mean and the median values for the parameters FOT, pH e TU.
The Spearman correlation matrix for the monitored variables is found in Table 2.The observation of Table 2 shows the high direct correlation for the following pairs of variables: (NIA-AMO), (NIA-NIT) and (NIT-AMO), at a significant level of 5%.This reveals dependence among variables caused by natural processes.This interaction among variables can be explained since they are all derived from nitrogenated compounds.According to Medeiros et al. (2002) and Piedras et al. (2006), nitrogen undergoes several changes in form and states of oxidation in the aquatic world.Those of greatest interest and in descending order of state of oxidation are: nitrate (NO 3 -), nitrite (NO 2 -), ammoniacal nitrogen (NH 4 -N) or ammonia (NH 3 or NH 4 + ) and organic nitrogen (dissolved or in suspension).
It is possible to infer that the addition of ammoniacal nitrogen determined the increase in ammonia and nitrate concentrations.With the increases in nitrate, the quantity of organic matter increases, which leads to the consequent increase in turbidity and suspended solids, as explained by Andrade et al. (2007) in the Rio Acaraú basin study, when the turbidity and suspended solids variables had high correlation, as well as the pairs of variables "color and turbidity" and "color and ammonia".This author explained that the correlations between color, turbidity and suspended solids can be understood by the fact that the water color is defined by the reflection and refraction of light on dissolved or suspended materials, linking the water color intensification to the contribution of municipal and industrial sewage and agricultural activities.
However, the SS variable has the opposite effect, according to Table 2. Since the nitrate has a saline nature, its increase also influences the increased electrical conductivity of the water.
On the other hand, increased FOT implied a growth in the quantity of algae, which caused an increase in TU and SS.This fact characterizes a process of water eutrophication, explained Mansor et al. (2006), which quantifies rural diffuse contributions of N and P in the surface water.Thus, the correlation matrix revealed independent relation among the SS and NIA, FOT, CE and pH parameters.
Although a good correlation between the TU and SS variables was expected, the statistical analysis showed that the variables were independent in this study and the option was made to carry out the water quality index without the suspended solids parameter (SS).Table 3 shows the breakdown of the correlation matrix, which, when conditioned upon the existing interrelations among original data, resulted in a reduction of space for observed variables translated by the common factors.After a few attempts, it was concluded that three factors would be adequate for the case study.After that, the value for each factor was calculated for each sampling station.This value was called a factorial score and it comprises the Water Quality Index (WQI).
Using the main components method, three factors associated with self-values that were greater than the unit were selected, which explained 72.1% of the total variance of original variables, indicating a good degree of information conservation.
Since the first factor (F1) explains the greatest variability of data (38.3%), it is adopted as a water quality index (WQI).
Observing the values of estimated commonalities, notice that 91.0% of the AMO parameter is explained by three common factors (F1, F2 and F3), while 58.6% of the CE variance is explained by these factors.Thus, upon analyzing the oneness (a complement of commonality), it is observed that the smallest failure to explain unitary total variance occurs in the AMO variable (9.0%) and the largest failure occurs in the CE variable (41.4%).
Using the Barlett criterion, which is also used by Toledo & Nicolella (2002), the Water Quality Index (WQI) was represented by the following expression (Eq.2).
The WQI calculated in this study is in the interval between 0 and 2, and the lower it is the better the quality of the water.According to the Z i standardized variable coefficients shown in the WQI, AMO was the variable that most influenced the index, followed by NIA and NIT, and the TU variable was the least influent.
According to Sperling (1996) and Vidal et al. (2000), in water sources and rivers the determination of the predominant portion of nitrogen may provide information about the pollution status.The nitrogen compounds in organic form or ammonia refer to the recent pollution, while nitrite and nitrate pollution more remote contamination.Since ammoniacal nitrogen is one of the first steps in the organic matter decomposition, its presence can be related to the precarious construction of wells and the lack of aquifer protection (Alaburda & Nishihara, 1998).
According to Silva & Araújo (2003), elevated levels of nitrates indicate contamination by the inappropriate disposal of human or industrial waste, or the use of nitrogenated fertilizers in agriculture.
Figure 2A shows the variation of the calculated WQI value for the six points monitored over the collection period, and the corresponding values for pluviometric precipitation.It was observed that, in general, the P1 collection point showed the best water quality indices, while P6 had the worst index of all the sampling points.This was expected, since P1 is the source of the Rio das Pedras (main river), corresponding to the highest point of the watershed, and P6 corresponds to the point of greatest contribution, which even receives urban waste.
March 2004 demonstrated the best overall WQI values, whereas May 2004 had the worst.It is believed that the low water quality index during this period is due to the application of fertilizers before transplanting the tomato to the watershed since most producers plant from February to April, applying pre-planting fertilizers from January to March.Thus, a worsening in water quality can be observed, especially at P6, which receives the contribution of almost the entire watershed.Precipitation (the sum of daily rains that occurred each month) influenced the water quality index since its behavior caused alterations in the data in some of the monitored months, as occurred in the researches of Carvalho et al. (2000) and Zimmermann et al. (2008).It is important to take into consideration that the relationship between the days when there was greatest precipitation and the days when field sample collections were taken was not studied.
A comparison of the WQI calculated for the source of the Rio das Pedras (P1) and the point of greatest contribution (P6) is shown in Figure 2B.From Figure 2B it is possible to observe the existence of a similar behavior for the two sampled points despite the distance between them.Furthermore, the occurrence of contaminations along the watershed was confirmed since the water quality at P6 (downstream from the source of the Rio das Pedras) proved to be inferior to P1 over the thirteen months of the collection.However, this contamination is not caused only by agricultural activity, since P6 also receives urban effluents.CONCLUSIONS 1.The developed water quality index demonstrated the deteriorating conditions of water quality in the Rio das Pedras Watershed due to agricultural activities, although the analysis was not comprehensive due the lack of monitoring other variables.
2. The elaboration of the WQI permitted the characterization of water quality along the watershed, demonstrating its potential as a tool to contribute to the monitoring of water resource availability in the region.
Hydrological map of the Rio das Pedras Watershed and the water sample collection points

Table 1 .
Basic descriptive statistics of the eight variables being studied at the six sampling points from September/2003 to September/2004