Acessibilidade / Reportar erro

Multivariate statistical analysis to support the minimum streamflow regionalization

Estatísticas multivariadas como suporte à regionalização de vazões mínimas1 1 Parte da Tese de Doutorado desenvolvida pelo primeiro autor. Projeto financiado pela FAPEMIG, CAPES e CNPq

ABSTRACT

This study aimed to develop a methodology based on multivariate statistical analysis of principal components and cluster analysis, in order to identify the most representative variables in studies of minimum streamflow regionalization, and to optimize the identification of the hydrologically homogeneous regions for the Doce river basin. Ten variables were used, referring to the river basin climatic and morphometric characteristics. These variables were individualized for each of the 61 gauging stations. Three dependent variables that are indicative of minimum streamflow (Q7,10, Q90 and Q95). And seven independent variables that concern to climatic and morphometric characteristics of the basin (total annual rainfall – Pa; total semiannual rainfall of the dry and of the rainy season – Pss and Psc; watershed drainage area – Ad; length of the main river – Lp; total length of the rivers – Lt; and average watershed slope – SL). The results of the principal component analysis pointed out that the variable SL was the least representative for the study, and so it was discarded. The most representative independent variables were Ad and Psc. The best divisions of hydrologically homogeneous regions for the three studied flow characteristics were obtained using the Mahalanobis similarity matrix and the complete linkage clustering method. The cluster analysis enabled the identification of four hydrologically homogeneous regions in the Doce river basin.

principal component analysis; cluster analysis; homogeneous regions

RESUMO

Este trabalho teve por objetivo propor uma metodologia baseada em análises estatísticas multivariadas de componentes principais e de agrupamento, com o intuito de identificar as variáveis mais representativas em estudos de regionalização de vazões mínimas e otimizar a obtenção das regiões hidrologicamente homogêneas para a bacia hidrográfica do Rio Doce. Foram utilizadas 10 variáveis referentes às características climáticas e morfométricas da bacia. Essas variáveis foram individualizadas para as 61 estações fluviométricas adotadas, sendo três variáveis dependentes e indicativas das vazões mínimas (Q7,10, Q90 e Q95) e sete independentes (precipitação total anual – Pa; precipitação total do semestre seco – Pss e chuvoso –Psc; área de drenagem da bacia –Ad; comprimento do rio principal –Lp; comprimento total dos cursos d'água da bacia – Lt; e declividade média da bacia –SL). O resultado da análise de componentes principais apontou a variável independente SL como a menos representativa e foi excluída do estudo. As variáveis independentes mais representativas foram Ad e Psc. As melhores divisões de regiões hidrologicamente homogêneas para as três vazões características estudadas foram obtidas utilizando-se conjuntamente da matriz de similaridade de Mahalanobis e do método de agrupamento do vizinho mais distante. A análise de agrupamento possibilitou a identificação de quatro regiões hidrologicamente homogêneas na bacia hidrográfica do Rio Doce.

análise de componentes principais; análise de agrupamento; regiões homogêneas

INTRODUCTION

The hydrological performance of a river basin depends on its geomorphological characteristics, climate aspects and the type of plant coverage. Thereby, several physical and biotic variables of a basin play an important role in the hydrological cycle processes.

River basins with large drainage areas can present different hydrological performances in distinct parts. For that reason, the identification of hydrological homogeneity of a determined region is one of the first goals to be reached in order to have a correct water resources management.

Hydrological generalization is defined as any process of transferring information from one region of identified hydrological performance to other places, usually without observations. This transfer can be directly reported to data series, or to certain relevant statistical parameters such as average, variance, maximum and minimum events, or even equations and parameters related to these statistical parameters.

MISHRA & COULIBALY (2009)Mishra, A. K.; Coulibaly, P. Hydrometric network evaluation for Canadian watersheds. Journal of Hydrology, Amsterdam, n.380, p.420-437, 2009. describe the importance of having reliable information in a watershed scope due to its innumerous practical uses: hydrology, agronomy, climatology, hydrogeology, management and planning of water resources, decision processes for implementation of public policies and industrial plants installations.

In this context, multivariate statistical analyses can expressively assist the conduction of hydrological generalization studies, reducing the processing time of database and increasing the reliability of obtained results. In an international level, this affirmation can be ascertained through the development of several studies aiming hydrological generalization (ASSANI et al., 2011Assani, A. A.; Chalifour, A.; Légaré, G.; Manouane, C.; Leroux, D. Temporal regionalization of 7-day low flows in the St. Laurence watershed in Quebec (Canada). Water Resources Management, Dordrecht, v. 25, p. 3559-3574, 2011.; MWALE et al., 2011Mwale, D.; Gan, T. Y.; Devito, K. J.; Silins, U.; Mendoza, C.; Petrone, R. Regionalization of runoff variability of alberta, canada, by wavelet, independent component, empirical orthogonal function, and geographical information system analysis. Journal of Hydrologic Engineering, New York, v.16, n.2, p.93-107, 2011.;SAMUEL et al., 2011Samuel, J.; Coulibaly, P.; Metcalfe, R. A. Estimation of continuous streamflow in ontario ungauged basins: comparison of regionalization methods.Journal of Hydrologic Engineering, New York, v.16, n.5, p.447-459, 2011.; ENGELAND & HISDAL, 2009Engeland, K.; Hisdal, H. A comparison of low flow estimates in ungauged catchments using regional regression and the HBV-Model. Water Resources Management, Dordrecht, v.23, p.2567-2586, 2009.; CASTIGLIONI et al., 2009)Castiglioni, S.; Castellarin, A.; Montanari, A. Prediction of low-flow indices in ungauged basins through physiographical space-based interpolation. Journal of Hydrology, Amsterdam, v. 378, p. 272-280, 2009..

The principal component analysis aims to review the correlations between studied variables, to summarize a large set of variables in a smaller one and in an equivalent purpose, to evaluate the importance of each variable and to promote the elimination of the ones that contribute little, in terms of variation, in the group of evaluated individuals (WILKS, 2006Wilks, D. S. Statistical methods in the atmospheric sciences. London: Academic Press, 2006. 630 p.).

Over the past years, this technique has been used in a variety of domains, including medicine (SILVA et al., 2010Silva, S. F. R.; Matos, D. C.; Silva, S. L.; Daher, E. F.; Campos, H. H.; Silva, C. A. B. Chemical and morphological analysis of kidney stones: A double-blind comparative study. Acta Cirúrgica Brasileira, São Paulo, v. 25, n. 5, p. 444-448, 2010.), remote sensing (JESUS & EPIPHANIO, 2010Jesus, S. C.; Epiphanio, J. C. N. Sensoriamento remoto multissensores para a avaliação temporal da expansão agrícola municipal.Bragantia, Campinas, v. 69, n. 4, p. 945-956, 2010.), farming (ARRUDA et al., 2012Arruda, N. P.; Hovell, A. M. C.; Rezende, C. M.; Freitas, S. P.; Couri, S.; Bizzo, H. R. Correlação entre precursores e voláteis em café arábica brasileiro processado pelas vias seca, semiúmida e úmida e discriminação através da análise por componentes principais. Química Nova, São Paulo, v. 35, n. 10, p. 2044-2051, 2012.; ARRUDA et al., 2011Arruda, N. P.; Hovell, A. M. C.; Rezende, C. M.; Freitas, S. P.; Couri, S.; Bizzo, H. R. Discriminação entre estádios de maturação e tipos de processamento de pós-colheita de cafés arábica por microextração em fase sólida e análise de componentes principais. Química Nova, São Paulo, v. 34, n. 5, p. 819-824, 2011.; YAMAKI et al., 2009)Yamaki, M.; Menezes, G. R. O.; Paiva, A. L. C.; Barbosa, L.; Silva, R. F.; Teixeira, R. B.; Torres, R. A.; Lopes, P. S. Estudo de características de produção de matrizes de corte por meio da análise de componentes principais.Arquivo Brasileiro de Medicina Veterinária e Zootecnia, Belo Horizonte, v. 61, n. 1, p. 227-231, 2009., chemistry (BELLOMARINO et al., 2010Bellomarino, S. A.; Parker, R. M.; Conlan, X. A.; Barnett, N. W.; Adams, M. J. Partial least squares and principal components analysis of wine vintage by high performance liquid chromatography with chemiluminescence detection. Analytica Chimica Acta, Amsterdam, v. 678, p. 34-38, 2010.; FARO JR. et al., 2010Faro Jr, A. C.; Rodrigues, V. O.; Eon, J.; Rocha, A. S. Análise por componentes principais de espectros nexafs na especiação do molibdênio em catalisadores de hidrotratamento. Química Nova, São Paulo, v. 33, n. 6, p. 1342-1347, 2010.), analysis of soil (ISLABÃO et al., 2013)Islabão, G. O.; Pinto, M. A. B.; Selau, L. P. R.; Vahl, L. C.; Timm, L. C. Characterization of soil chemical properties of strawberry fields using principal component analysis. Revista Brasileira de Ciência do Solo, Viçosa, MG, v.37, n.1, p.168-176, 2013., environmental studies (GUEDES et al., 2012Guedes, H. A. S.; Silva, D. D.; Elesbon, A. A. A.; Ribeiro, C. B. M.; Matos, A. T.; Soares, J. H. P. Aplicação da análise estatística multivariada no estudo da qualidade da água do Rio Pomba, MG. Revista Brasileira de Engenharia Agrícola e Ambiental, Campina Grande, v.16, n. 5, p. 558-563, 2012.; REID & SPENCER, 2009)Reid, M. K.; Spencer, K. L. Use of principal components analysis (PCA) on estuarine sediment datasets: The effect of data pre-treatment.Environmental Pollution, Barking, v.157, p.2275-2281, 2009. among others.

Clustering multivariate statistical analysis is a data exploratory tool that aims to classify homogeneous groups (WILKS, 2006Wilks, D. S. Statistical methods in the atmospheric sciences. London: Academic Press, 2006. 630 p.). It has been employed in several areas of knowledge, for instance, genetics (CARVALHO et al., 2009Carvalho, M. F.; Albuquerque Junior, C. L.; Guidolin, A. F.; Farias, F. L. Aplicação da análise estatística multivariada em avaliações de divergência genética através de marcadores moleculares dominantes em plantas medicinais.Revista Brasileira de Plantas Medicinais, Botucatu, v. 11, n. 3, p. 339-346, 2009.), management (COUTO JR & GALDI, 2012Couto Jr, C. G.; Galdi, F. C. Avaliação de empresas por múltiplos aplicados em empresas agrupadas com análise de cluster. Revista de Administração Mackenzie, São Paulo, v. 13, n. 5, p. 135-170, 2012.), health (RESENDES et al., 2010)Resendes, A. P. C.; Silveira, N. A. P. R.; Sabroza, P. C.; Souza-Santos, R. Determinação de áreas prioritárias para ações de controle da dengue. Revista de Saúde Pública, São Paulo, v. 44, n. 2, p. 274-282, 2010. and environmental engineering (HATVANI et al., 2011)Hatvani, G. I.; Kovács, J.; Kovács, I. S.; Jakusch, P.; Korponai, J. Analysis of long-term water quality changes in the Kis-Balaton Water Protection System with time series, cluster analysis and Wilk's lambda distribution.Ecological Engineering, Amsterdam, v. 37, p. 629-635, 2011.. In hydrology, cluster analysis is a technique often used to define classes or to group stations into homogeneous climate regions, i.e. regionalization.

In light of the above, this study aimed at developing a methodology based on multivariate statistical analyses of principal component and cluster analysis, targeting the identification of the most representative variable in studies of minimum streamflow regionalization, as well as to optimize the obtainment of hydrological homogeneous regions for the Doce river basin.

MATERIAL AND METHODS

Region of study

The Doce river basin is located in the Southeast region of Brazil, between the parallels 17°45' and 21°15' S and the meridians 39°30' and 43°45' W, with average altitude of 578 m. It presents drainage area of 83.400 km2, approximately, of which 86% belong to the state of Minas Gerais and 14% to the state of Espírito Santo. The population in the basin is of approximately 3.1 million habitants, with 70% living in urban areas. The Doce river basin in inserted, in 98% of its area, in the Brazilian biome called Atlantic Rainforest and the rest belongs to the Cerrado biome. The leading developed economic activities are mining, metallurgy, forestry and farming (CBH-Doce, 2010).

Database and Applications

The study was conducted using data from 61 gauging stations belonging to hydro meteorological network of the National Water Agency – NWA. The employed series consisted of daily data of flow corresponding to the base period from 1976 to 2005 (Table 1). It is highlighted that the use of data up to the year 2005 was limited due to the fact that, in the beginning of the study, this was the most recent year with consisted data provided by NWA.

TABLE 1
Gauging stations selected for a minimum streamflow regionalization.

The vector base of elevation (contour lines and elevated points) and of hydrography of the hydrographic region obtained with the Brazilian Institute of Geography and Statistics– IBGE at a scale of 1:250.000.

The generation of hydrographically conditioned digital elevation methods (HCDEM), the automatic achievement of morphometric variables, of average precipitations and the spatialization of results were conducted with the aid of the ArcGIS® 10.0 software, as a geoprocessing tool of vectors and spatial representation of data.

In the present study, 10 variables were considered, as three were dependent variables to be regionalized (minimum average flow of seven consecutive days and recurrence period of ten years - Q7,10; and the minimum streamflow associated with permanence in time of 90% -Q90 and 95% -Q95) and seven independent variables (total annual rainfall – Pa, total semiannual rainfall of the dry season–Pss and rainy season –Psc, in mm; watershed drainage area –Ad,in km2; length of the main river – Lp,in km; total length of the rivers –Lt, in km; and average watershed slope –SL, in %).

Principal component analysis

Based on the principal components analysis, the original set of observed independent variables (Pa, Pss, Psc, Ad, Lp, Lt and SL) was transformed into a new set of variables, named principal components, meeting the following criteria (JOLLIFE, 2002Jollife, I. T. Principal component analysis. 2. ed. Springer, 487 p., 2002.):

  1. Considering that Yi is a principal component of data matrix, it will be a linear combination among the seven independent variables regarded;

  2. The sum of the coefficients square aij is equal to 1;

  3. each principal component has its own coefficients;

  4. the components are not correlated, which means that they are independent of one another;

  5. among all the components, Y1 presents the greatest variance, Y2 the second greatest and so forth;

  6. the sum of variances of each principal component (Yi) is equal to the sum of variances of each variable (Xj).

As R is a symmetric correlation matrix, of dimensions p x p, from which the eigenvalues (λi) and the eigenvectors (ai) are extracted, the solution was achieved by solving the system (eq. 1):

In which,

λi – are the root characteristics (or eigenvalues) of R matrix. There are p eigenvalues corresponding to the variances of each one of the p principal components;

I – is the identity matrix of p x p dimension;

ai – eigenvector or characteristic vector or ap x 1 matrix, containing the p coefficients for each eigenvalue λi corresponding to Yiprincipal component,

Φ – is a zero vector, of p x 1 dimension.

One of the most common problems found in the application of multivariate statistical models is that these depend on the unities and scales in which the variables were measured. The data of variables were standardized through the [eq. (2)], eliminating the dependence of unities and scales in which the variables were presented:

in which,

Zij standardized variable;

σ(Xj) – standard deviation, and

– average of j-th original variable.

The importance of each principal component was evaluated through the existing correlation with each Xj variable, which is (eq. 3):

The following criteria were used to select the principal components in this study:

  • accumulated percentage of original data total variance greater or equal to 75% (JOLLIFE, 2002Jollife, I. T. Principal component analysis. 2. ed. Springer, 487 p., 2002.); and

  • eigenvalues greater or equal to the average of eigenvalues (RENCHER, 2002Rencher, A. C. Methods of multivariate analysis. 2th ed. Wiley-Interscience, 2002. 708p.).

Cluster analysis

The cluster process was based on two steps: firstly, the estimation of Mahalanobis similarity between the 61 grouped gauging stations and secondly the adoption of a cluster technique between the single linkage method and complete linkage method, to form the groups.

When Euclidean distance is used for cluster analysis, all variables must be considered not correlated among themselves, although this presumption is usually ignored. In order to avoid this common problem in studies of hydrological generalization, a matrix of similarities was constituted with Mahalanobis distance. In practice, Mahalanobis distance is summarized in the application of the Euclidean distance (eq. 4) to the standardized matrix of data.

In which,

Zij and Zzj – observations of i-th and z-th gauging stations (i = 1, 2, …, n and z = 1, 2, …, n), with reference to j-th variable or absolute frequency in each studied class (j = 1, 2, …, p).

The single linkage method consists, initially, of a distance matrix (dissimilarity) between gauging stations (individuals). The two most similar individuals were identified (by smaller distance between them) and were reunited in the initial group. In sequence, the distance from the first group in relation to the other individuals was calculated.

The distance between a group and an individual was provided through the expression (eq. 5):

Which means, the distance between the group constituted of the individualsa and b and the individualc was provided by the smallest element from the set of distances of the pairs ac and bc.

From the identification of the smallest distances between the constituted group and the neighboring individuals, a new matrix of dissimilarity was developed of smaller dimensions compared to the initial group and the most similar individuals and/or groups were identified. They were then either incorporated into the initial group or arranged into a second group, depending on whether or not the smallest distance of the new matrix of dissimilarity had been visualized between two other individuals.

In the subsequent stages, increasingly smaller dissimilarity matrices were employed, completing the grouping of all individuals in a single group and composing a dendrogram or tree.

The complete linkage clustering method presents a procedure similar to single linkage method, with one important difference: in each stage, the distance was given by the one that enabled the greatest distance between two individuals and/or groups.

The distance between a group and an individual was provided by the expression: (eq. 6):

which means, the distance between the group constituted by the individualsa and b and the individualc was provided by the greatest element of the distance between sets of pairs of ac andbc.

The construction of dissimilarity matrices, of smaller dimensions than the initial, followed the same procedure described in the single linkage method. The only difference was the creation of groups through maximum distances (complete linkage) rather than through minimum distances (dingle linkage).

The definition of the number of homogeneous regions of flow characteristics was carried out using the criterion of inertia between jumps, in which the first visible discontinuity of the graphic is defined as ‘cut-off' (MELO JÚNIOR et al., 2006Melo Júnior, J. C. F.; Sediyama, G. C.; Ferreira, P. A.; Leal, B. G. Determinação de regiões homogêneas quanto à distribuição de freqüência de chuvas no leste do Estado de Minas Gerais. Revista Brasileira de Engenharia Agrícola e Ambiental, Campina Grande, v.10, n.2, p.408-416, Campina Grande, PB, 2006.; RENCHER, 2002Rencher, A. C. Methods of multivariate analysis. 2th ed. Wiley-Interscience, 2002. 708p.; WILKS, 2006Wilks, D. S. Statistical methods in the atmospheric sciences. London: Academic Press, 2006. 630 p.).

Multiple regression analysis

The regression models used to create regionalization equations for each hydrologically homogeneous region were linear, potential, exponential, logarithmic and reciprocal.

The models resulting from the application of multiple regression considered in the hydrologically homogeneous regions provided by the cluster analysis, were selected through the following observations:

  • representative equation of the studied event;

  • lower number of independent variables according to the relative significance provided by the principal components analysis;

  • greater values of adjusted determination coefficient;

  • lower values of factorial standard-error;

  • significant results by the F test; continuity of flows; and

  • Convenience of geographic spatialization of the obtained equations.

In order to verify the adjustment of the adopted regression models to the data, an adjusted determination coefficient (r2a> 0.70), a standard error of estimate lower than 0.5 (EP < 0.5) and a significance level of 5% by F test, were used.

RESULTS AND DISCUSSION

Principal components analysis

Based on the seven independent variables used (Pa, Pss, Psc, Ad, Lp, Lt e SL) for each one of the 61 gauging stations adopted, analysis of the principal components was conducted. The total variance existing in the set of analyzed multivariate data was equal to the number of analyzed variables after data samples were standardized with average and variance equal to 0 and 1, respectively.

In Table 2, the correlation matrix between the standardized independent variables is displayed. In order to evaluate the importance of each variable and promote the elimination of the ones that contribute little in terms of variation, the principal components for the studied variables were identified in the group of individuals evaluated in the regionalization analysis of flows (Table 3).

TABLE 2
Correlation matrix R between the independent variables considered.

Table 3
Principal components (CP) of studied variables.

According to HELENA et al. (2000)Helena, B.; Pardo, R.; Vega, M.; Barrado, E.; Fernández, J. M.; Fernández, L. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Research, Amsterdam, v.34, p.807-816, 2000., correlation coefficients superior to 0.5 express a strong relationship between evaluated variables. Table 2 demonstrates that climate variables Pa and Psc are strongly correlated to one another and variable Pss is moderately correlated to variables Pa and Psc. Morphometric variables Ad, Lp and Lt are highly correlated to one another, however the morphometric variable SL presents weak correlation to the remaining analyzed variables (R<0.5), which indicates that it should possibly be excluded from this study.

Based on data presented on Table 3, only the two first components (Y1 and Y2) were considered, as they simultaneously met two adopted criteria of selection (the accumulated variance explaining a value greater or equal to 75% of the total data variation and eigenvalues greater or equal to 1). The other components were not considered, which together explained 22.08% of the total variation. Table 4 presents the correlations, or load factors, between the seven standardized variables and the two first principal components.

TABLE 4
Load factors between the standardized variables (VP) and the principal components (CP) and variance (λi) of each principal component (i = 1, 2).

It is observed on Table 4 that the standardized variables Z4, Z5 and Z6 present greater correlations with the first principal component (Y1), while the variables Z1, Z2 and Z3 indicate greater correlations with the second principal component (Y2). The variable Z7 can be discarded from the study as it contributes little to the group of evaluated individuals in terms of variation, confirming the result obtained by the analysis of correlation matrix R.

The average watershed slope (SL) variable presents insignificant representativeness in relation to the performance of studied flow characteristics, as it defines a uniform surface of all drainage areas, which does not physically represent the natural process of river channel runoff. For this reason, the exclusion of the variable SL from the selected set was expected.

The monitoring of water resources involves a great number of variables and the quantitative reduction of unnecessary information directly leads to savings in time and resources. MISHRA & COULIBALY (2009)Mishra, A. K.; Coulibaly, P. Hydrometric network evaluation for Canadian watersheds. Journal of Hydrology, Amsterdam, n.380, p.420-437, 2009. demonstrated in their study, the importance of having reliable variables in engineering studies in a watershed. CASTIGLIONI et al. (2009)Castiglioni, S.; Castellarin, A.; Montanari, A. Prediction of low-flow indices in ungauged basins through physiographical space-based interpolation. Journal of Hydrology, Amsterdam, v. 378, p. 272-280, 2009. also used physiographic variables when trying to identify hydrologically homogeneous regions.

Physically, the principal component Y1 represents the most representative morphometric variables and the principal component Y2represents the average rainfalls in drainage areas upwind of each gaging station. ASSANI et al. (2011)Assani, A. A.; Chalifour, A.; Légaré, G.; Manouane, C.; Leroux, D. Temporal regionalization of 7-day low flows in the St. Laurence watershed in Quebec (Canada). Water Resources Management, Dordrecht, v. 25, p. 3559-3574, 2011. achieved great results using the technique of principal components analysis in river basins in Canada.

According to WILKS (2006)Wilks, D. S. Statistical methods in the atmospheric sciences. London: Academic Press, 2006. 630 p., the obtained results showed that the use of principal components tool for the regionalization of flows, even in a preliminary way, is fundamental to the elimination of little expressive variables, thus increasing the spatial reliability of hydrologically homogeneous regions.

Cluster analysis

After disregarding the variable SL, from the results achieved in the principal components analysis, the homogeneous regions for the three flows were obtained separately, based on standardized variables that presented greater correlations with the two first principal components (Ad, Lt, Lp, Pa, Psc e Pss) from the distance matrix of Mahalanobis.

The closest neighbor method presented irregular clusters for the three studied flow characteristics and was discarded. MELO JÚNIOR et al. (2006)Melo Júnior, J. C. F.; Sediyama, G. C.; Ferreira, P. A.; Leal, B. G. Determinação de regiões homogêneas quanto à distribuição de freqüência de chuvas no leste do Estado de Minas Gerais. Revista Brasileira de Engenharia Agrícola e Ambiental, Campina Grande, v.10, n.2, p.408-416, Campina Grande, PB, 2006. found a similar situation and also disregarded the clusters obtained for the nearest neighbor method.

The complete linkage method presented easy interpretation of results and equal number of clusters for the three evaluated flows. For this method, the cut-off can be identified as the approximate distance of 19% of dissimilarity in a dendrogram, in which four groups are formed with homogeneous characteristics of flow for all the considered flows. In order to illustrate the achieved result, in Figures 1 and 2, the graphics of dissimilarity distance vs clustering steps and dendrogram for the variable Q7,10 are each presented.

FIGURE 1
Dissimilarities distance vs. cluster steps to Q7,10 from the furthest neighbor method.

FIGURE 2
Dendrogram for Q7,10 showing the clustering steps from the furthest neighbor method.

By analyzing Figure 1, the first discontinuity is observed between the clustering steps 56 and 57. From this result, four hydrologically homogeneous regions were identified for the three analyzed flow characteristics, which followed the same performance through the application of clustering method (Figure 2).

MELO JÚNIOR et al. (2006)Melo Júnior, J. C. F.; Sediyama, G. C.; Ferreira, P. A.; Leal, B. G. Determinação de regiões homogêneas quanto à distribuição de freqüência de chuvas no leste do Estado de Minas Gerais. Revista Brasileira de Engenharia Agrícola e Ambiental, Campina Grande, v.10, n.2, p.408-416, Campina Grande, PB, 2006. obtained good results with the complete linkage clustering method in studies of precipitation.

Homogeneous regions

Through the complete linkage clustering method, four regions with homogeneous characteristics of flow for the Doce river basin were obtained, as described:

  • Region I – composed of stations with smaller flows and drainage areas. Spatially comprised of headwater regions and small tributaries. Seventeen (17) gauging stations comprise this region for all studied flows with drainage areas varying from 166 to 970 km2.

  • Region II – intermediate region between regions I and III, which were composed of 12 gauging stations with drainage areas varying from 757 km2 to 1.396 km2.

  • Region III – intermediate region between regions II and IV. Spatially constituting of the main tributaries of the greater flow rivers of the basin and comprised of 13 gauging stations with drainage areas varying from 1,200 to 3,055 km2.

  • Region IV – comprised of stations with greater flows and drainage areas. Spatially constituted of the key channel of the Doce River and its main tributaries: Piracicaba, Santo Antônio, Suaçuí and Manhuaçu. Nineteen gauging stations comprise this homogeneous region with drainage areas varying from 2,578 to 81,940 km2.

Figure 3 presents the spatial configuration of the four hydrologically homogeneous regions for the flows Q7,10, Q90 and Q95, that presented recurring hydrological performance. For the delimitation of homogeneous regions, the influence areas of gauging stations that comprise them were extended up to the outflow region in the largest river downstream, in accordance with the process described by MARQUES et al. (2009)Marques, F. A.; Silva, D. D.; Ramos, M. M.; Pruski, F. F. Sistema multi-usuário para gestão de recursos hídricos. Revista Brasileira de Recursos Hídricos, Porto Alegre, v.14, n.4, p.51-69, 2009..

FIGURE 3
Hydrologically homogeneous regions for minimum streamflow obtained for the Doce river basin.

In analyzing Figure 3, it is noticed that the homogeneous region of greatest spatial scope is region I (headwater regions and smaller drainage areas), followed by region IV (channel of main river and main tributaries), region III and region II.

It is highlighted that drainage areas inferior to 160km2 and superior to 82,000km2 were included in the hydrological regions I and IV, respectively. However, it is important to emphasize that the major parts of the Doce river basin do not allow for adequate monitoring (drainage areas smaller than 160km2), thus requiring adoption of other criteria for result projections in these regions.

RIBEIRO et al. (2005)Ribeiro, C. B. M.; Marques F. A.; Silva D. D. Estimativa e regionalização de vazões mínimas de referência para a bacia do rio Doce.Engenharia na Agricultura. Viçosa, v.13, n. 2, p. 103-107, 2005. worked with minimum streamflow of reference (Q7,10, Q90 and Q95) and obtained seven hydrologically homogeneous regions for the Doce River basin.MARQUES et al. (2009)Marques, F. A.; Silva, D. D.; Ramos, M. M.; Pruski, F. F. Sistema multi-usuário para gestão de recursos hídricos. Revista Brasileira de Recursos Hídricos, Porto Alegre, v.14, n.4, p.51-69, 2009. investigated the same watershed and employed minimum streamflow of reference (Q7,10, Q90 and Q95) in quarterly periods, obtaining hydrologically homogeneous regions.

It is highlighted that the regions the mentioned authors considered are subdivisions and/ or spatial junctions of hydrologically homogeneous regions found through the application of methodology presented in this study, based on unities of existing water resources management and sub-basins of the Doce River basin.

It is essential to distinguish that the methodology proposed in this study is based on multivariate statistical analysis and the obtained results point to the general hydrological performance of the Doce River basin.

The result obtained by the proposed methodology was complemented by the multiple regression analysis between the dependent variables (minimum streamflow characteristics) and the independent variables (climate and morphometric variables), to obtain regional equations for the four hydrologically homogeneous regions.

Multiple regression analysis

Considering the hydrologically homogeneous regions obtained through the application of the proposed scientific approach, for the investigated flow characteristics, the equations of multiple regression of linear, potential, exponential, logarithmic and reciprocate types were adjusted. Table 5 presents for each homogeneous region, the regression equations that adjusted best to the variables Q7,10, Q90 and Q95.

TABLE 5
Regression models that adjusted best to the minimum and average flow characteristics and the obtained adjustments.

In order to meet the selection criteria of regression equations, it was necessary to exclude three gauging stations for region I (56570000, 56935000, 56993002) and one gauging station for region 4 (56880000).

By analyzing Table 5, it can be observed:

  • The regression model that adjusted best to the flow data was the potential. The same performance for the regional equations was achieved by RIBEIRO et al. (2005)Ribeiro, C. B. M.; Marques F. A.; Silva D. D. Estimativa e regionalização de vazões mínimas de referência para a bacia do rio Doce.Engenharia na Agricultura. Viçosa, v.13, n. 2, p. 103-107, 2005. andMARQUES et al. (2009)Marques, F. A.; Silva, D. D.; Ramos, M. M.; Pruski, F. F. Sistema multi-usuário para gestão de recursos hídricos. Revista Brasileira de Recursos Hídricos, Porto Alegre, v.14, n.4, p.51-69, 2009. for the Doce River basin;

  • The most important independent variable for the study was drainage area (Ad) followed by average semiannual rainfall in rainy season (Psc);

  • The regional equations presented for the four hydrologically homogeneous regions, defined by the methodology proposed in this study, showed determination coefficients higher than 0.70, standard errors of estimate lower than 0.5 and significance levels of 5% by the F test.

The results achieved through multiple regression analysis were considered satisfactory, validating the scientific methodology presented in this study.

From previous knowledge of the region, the use of spatial analysis tools and the experience of an hydrologist, multivariate statistical analyses of both principal components and of clustering can contribute to the subdivision of hydrologically homogeneous regions, thus enabling more consistent decision-making, from a more reliable database (eliminating variables that contribute little to the study) of obtained clusters (verification of statistical performance of flow characteristics from the dendrogram).

CONCLUSIONS

Principal components analysis presented satisfactory results for excluding little representative variables in the identification of hydrologically homogeneous regions.

The first two principal components, Y1 eandY2, were responsible for 77.92% of data total variation.

The Mahalanobis similarity matrix and the complete linkage clustering method demonstrated great results in the identification of hydrologically homogeneous regions for all studied flows.

Four hydrologically homogeneous regions were obtained for all studied minimum flow characteristics.

The regionalization equations obtained through multiple regression analysis for the minimum flow characteristics were considered satisfactory, validating the scientific methodology presented in this study.

The methodology proposed for identification of the number of homogeneous regions showed great results, enabling the elimination of subjectivity in the identification of hydrologically homogeneous regions.

ACKLOWDEGMENTS

The authors want to thank the Research Support Foundation of Minas Gerais state (FAPEMIG), the Coordination for the Improvement of Higher Education Personnel (CAPES), the National Council for Scientific and Technological Development (CNPq) and the Federal University of Viçosa (UFV) for the funding this study.

REFERENCES

  • Arruda, N. P.; Hovell, A. M. C.; Rezende, C. M.; Freitas, S. P.; Couri, S.; Bizzo, H. R. Correlação entre precursores e voláteis em café arábica brasileiro processado pelas vias seca, semiúmida e úmida e discriminação através da análise por componentes principais. Química Nova, São Paulo, v. 35, n. 10, p. 2044-2051, 2012.
  • Arruda, N. P.; Hovell, A. M. C.; Rezende, C. M.; Freitas, S. P.; Couri, S.; Bizzo, H. R. Discriminação entre estádios de maturação e tipos de processamento de pós-colheita de cafés arábica por microextração em fase sólida e análise de componentes principais. Química Nova, São Paulo, v. 34, n. 5, p. 819-824, 2011.
  • Assani, A. A.; Chalifour, A.; Légaré, G.; Manouane, C.; Leroux, D. Temporal regionalization of 7-day low flows in the St. Laurence watershed in Quebec (Canada). Water Resources Management, Dordrecht, v. 25, p. 3559-3574, 2011.
  • Bellomarino, S. A.; Parker, R. M.; Conlan, X. A.; Barnett, N. W.; Adams, M. J. Partial least squares and principal components analysis of wine vintage by high performance liquid chromatography with chemiluminescence detection. Analytica Chimica Acta, Amsterdam, v. 678, p. 34-38, 2010.
  • Carvalho, M. F.; Albuquerque Junior, C. L.; Guidolin, A. F.; Farias, F. L. Aplicação da análise estatística multivariada em avaliações de divergência genética através de marcadores moleculares dominantes em plantas medicinais.Revista Brasileira de Plantas Medicinais, Botucatu, v. 11, n. 3, p. 339-346, 2009.
  • Castiglioni, S.; Castellarin, A.; Montanari, A. Prediction of low-flow indices in ungauged basins through physiographical space-based interpolation. Journal of Hydrology, Amsterdam, v. 378, p. 272-280, 2009.
  • Couto Jr, C. G.; Galdi, F. C. Avaliação de empresas por múltiplos aplicados em empresas agrupadas com análise de cluster. Revista de Administração Mackenzie, São Paulo, v. 13, n. 5, p. 135-170, 2012.
  • Engeland, K.; Hisdal, H. A comparison of low flow estimates in ungauged catchments using regional regression and the HBV-Model. Water Resources Management, Dordrecht, v.23, p.2567-2586, 2009.
  • Faro Jr, A. C.; Rodrigues, V. O.; Eon, J.; Rocha, A. S. Análise por componentes principais de espectros nexafs na especiação do molibdênio em catalisadores de hidrotratamento. Química Nova, São Paulo, v. 33, n. 6, p. 1342-1347, 2010.
  • Guedes, H. A. S.; Silva, D. D.; Elesbon, A. A. A.; Ribeiro, C. B. M.; Matos, A. T.; Soares, J. H. P. Aplicação da análise estatística multivariada no estudo da qualidade da água do Rio Pomba, MG. Revista Brasileira de Engenharia Agrícola e Ambiental, Campina Grande, v.16, n. 5, p. 558-563, 2012.
  • Hatvani, G. I.; Kovács, J.; Kovács, I. S.; Jakusch, P.; Korponai, J. Analysis of long-term water quality changes in the Kis-Balaton Water Protection System with time series, cluster analysis and Wilk's lambda distribution.Ecological Engineering, Amsterdam, v. 37, p. 629-635, 2011.
  • Helena, B.; Pardo, R.; Vega, M.; Barrado, E.; Fernández, J. M.; Fernández, L. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Research, Amsterdam, v.34, p.807-816, 2000.
  • Islabão, G. O.; Pinto, M. A. B.; Selau, L. P. R.; Vahl, L. C.; Timm, L. C. Characterization of soil chemical properties of strawberry fields using principal component analysis. Revista Brasileira de Ciência do Solo, Viçosa, MG, v.37, n.1, p.168-176, 2013.
  • Jesus, S. C.; Epiphanio, J. C. N. Sensoriamento remoto multissensores para a avaliação temporal da expansão agrícola municipal.Bragantia, Campinas, v. 69, n. 4, p. 945-956, 2010.
  • Jollife, I. T. Principal component analysis 2. ed. Springer, 487 p., 2002.
  • Marques, F. A.; Silva, D. D.; Ramos, M. M.; Pruski, F. F. Sistema multi-usuário para gestão de recursos hídricos. Revista Brasileira de Recursos Hídricos, Porto Alegre, v.14, n.4, p.51-69, 2009.
  • Melo Júnior, J. C. F.; Sediyama, G. C.; Ferreira, P. A.; Leal, B. G. Determinação de regiões homogêneas quanto à distribuição de freqüência de chuvas no leste do Estado de Minas Gerais. Revista Brasileira de Engenharia Agrícola e Ambiental, Campina Grande, v.10, n.2, p.408-416, Campina Grande, PB, 2006.
  • Mishra, A. K.; Coulibaly, P. Hydrometric network evaluation for Canadian watersheds. Journal of Hydrology, Amsterdam, n.380, p.420-437, 2009.
  • Mwale, D.; Gan, T. Y.; Devito, K. J.; Silins, U.; Mendoza, C.; Petrone, R. Regionalization of runoff variability of alberta, canada, by wavelet, independent component, empirical orthogonal function, and geographical information system analysis. Journal of Hydrologic Engineering, New York, v.16, n.2, p.93-107, 2011.
  • CBH-Doce – Comitê da Bacia Hidrográfica do Rio Doce. Plano Integrado de Recursos Hídricos da Bacia do Rio Doce. Disponível em: < http://www.cbhdoce.org.br/documentos/pirh/plano-diretor-da-bacia-do-doce-pirh/>. Acesso em: mar. 2010.
    » http://www.cbhdoce.org.br/documentos/pirh/plano-diretor-da-bacia-do-doce-pirh/>
  • Reid, M. K.; Spencer, K. L. Use of principal components analysis (PCA) on estuarine sediment datasets: The effect of data pre-treatment.Environmental Pollution, Barking, v.157, p.2275-2281, 2009.
  • Rencher, A. C. Methods of multivariate analysis 2th ed. Wiley-Interscience, 2002. 708p.
  • Resendes, A. P. C.; Silveira, N. A. P. R.; Sabroza, P. C.; Souza-Santos, R. Determinação de áreas prioritárias para ações de controle da dengue. Revista de Saúde Pública, São Paulo, v. 44, n. 2, p. 274-282, 2010.
  • Ribeiro, C. B. M.; Marques F. A.; Silva D. D. Estimativa e regionalização de vazões mínimas de referência para a bacia do rio Doce.Engenharia na Agricultura. Viçosa, v.13, n. 2, p. 103-107, 2005.
  • Samuel, J.; Coulibaly, P.; Metcalfe, R. A. Estimation of continuous streamflow in ontario ungauged basins: comparison of regionalization methods.Journal of Hydrologic Engineering, New York, v.16, n.5, p.447-459, 2011.
  • Silva, S. F. R.; Matos, D. C.; Silva, S. L.; Daher, E. F.; Campos, H. H.; Silva, C. A. B. Chemical and morphological analysis of kidney stones: A double-blind comparative study. Acta Cirúrgica Brasileira, São Paulo, v. 25, n. 5, p. 444-448, 2010.
  • Sousa, H. T.; Pruski, F. F.; Sousa, J. F.; Bof, L. H. N.; Cecon, P. R. Sistema computacional para regionalização de vazões – SisCoRV 1.0. Viçosa: Universidade Federal de Viçosa, 2008.
  • Wilks, D. S. Statistical methods in the atmospheric sciences. London: Academic Press, 2006. 630 p.
  • Yamaki, M.; Menezes, G. R. O.; Paiva, A. L. C.; Barbosa, L.; Silva, R. F.; Teixeira, R. B.; Torres, R. A.; Lopes, P. S. Estudo de características de produção de matrizes de corte por meio da análise de componentes principais.Arquivo Brasileiro de Medicina Veterinária e Zootecnia, Belo Horizonte, v. 61, n. 1, p. 227-231, 2009.
  • 1
    Parte da Tese de Doutorado desenvolvida pelo primeiro autor. Projeto financiado pela FAPEMIG, CAPES e CNPq

Publication Dates

  • Publication in this collection
    Sep-Oct 2015

History

  • Received
    22 Apr 2013
  • Accepted
    7 Feb 2015
Associação Brasileira de Engenharia Agrícola SBEA - Associação Brasileira de Engenharia Agrícola, Departamento de Engenharia e Ciências Exatas FCAV/UNESP, Prof. Paulo Donato Castellane, km 5, 14884.900 | Jaboticabal - SP, Tel./Fax: +55 16 3209 7619 - Jaboticabal - SP - Brazil
E-mail: revistasbea@sbea.org.br