Hydrological responses in equatorial watersheds indicated by Principal Components Analysis ( PCA ) – study case in Atrato River Basin ( Colombia )

The Atrato river basin is located in the Pacific fringe of Colombia, region with one of the highest precipitation rates in the world. The main purpose of this study is to determine the dominant processes in the hydrological responses along 17 sub-basins within the basin using principal component analysis. Watersheds located at the headwater presented a fast or medium response to the precipitation events, while higher flow homogeneity was observed in watersheds located at the lower portions of the basin. Three principal components were responsible for explaining 85.18% of the total variance. The component PC1 revealed the largest contributions for low flow behavior, being associated to precipitation, characteristic discharge values, compactness index, soil coverage and soil coarse textures. The component PC2 was assigned to the geological variables, fine and average texture soil and the average basin slope. Finally, the component PC3 has shown to be related to high flow patterns (maximum characteristic discharge values Q5 and Q1), igneous rocks and length of the basin. Highest specific discharge was associated to alluvial deposits and forest cover, whereas the slope was considered determinant for the run-off generation.


INTRODUCTION
In tropical and equatorial regions, the atmospheric convergence processes and moisture transport from the oceans are responsible for high rainfall events, which generate fast responses in water discharge in river channels (Strauch, 2013;Strauch et al., 2015). Climate change prediction models (Gleeson et al., 2020) run for equatorial and tropical regions revealed a decrease in rainfall rates. Catchments located in these regions are, therefore, highly susceptible. Changes in their hydrological regime may impact the provision of regulatory ecosystem services.
Undoubtedly, it is a fact of relevance, to be taken into consideration within regional and local water resource management policies (Meixner et al., 2016;Ng et al., 2010;Sophocleous, 2002).
The Colombian territory is affected by three major sources of humidity: the Caribbean Sea, the Pacific Ocean and the Amazon rainforest. It presents a complex climate configuration, with remarkable spatiotemporal variations of precipitation associated to a large physiographic variability. Under such circumstances, the prediction and estimation tasks for hydroclimatological variables become complex (Poveda Jaramillo, 2004). In addition, there are other local factors such as the orography (with a maximum altitude of 3,945 masl) and climate features, such as El Niño Southern Oscillation (ENSO) among others, that may interfere on rainfall intensity and seasonality (Rueda & Poveda Jaramillo, 2006;Garreaud, 2009;Durán-Quesada et al., 2012). Recent simulations revealed water scarcity risk scenarios for the Andean region and flooding in the coastal regions associated with sea level rise (Intergovernmental Panel on Climate Change, 2014a, 2014b; Hurtado & Mesa, 2015;Pardo & Alfonso, 2018).
The Colombian Pacific Region presents the highest precipitation rates observed in the South America. They are associated with some climatic features, such as the migration of the Intertropical Convergence Zone (ITCZ), the Low Level Jet Stream of the Colombian West Region (Chorro del Occidente Colombiano, CHOCO), the meso-scale convective systems and an the orographic barriers represented by the Andes (Poveda Jaramillo & Mesa, 2000;Poveda Jaramillo et al., 2014).The Atrato River Basin, located in the northwestern region of Colombia, is mostly influenced by the moisture originated in the Pacific Ocean and the Caribbean Sea (Instituto de Investigaciones Ambientales del Pacífico, 2013; Poveda Jaramillo et al., 2014). Due to these high precipitation rates, the Atrato River presents the highest specific discharge among all the other rivers in Colombia (Leyva, 1993).
Actually, besides bearing heavy rainfall and high seismicity, the Atrato River Basin (hereafter ARB) is facing increasing demands for natural resources (fauna and flora) and several conflicts related to land use, illegal mining and uncontrolled deforestation (Vélez & Aguirre, 2016). These stresses increase the socioeconomic vulnerability of the population living in the basin and the occurrence of natural disasters, such as flooding and landslides (Mosquera Machado, 2006).
Due to its complexity, the ARB is considered an ideal target for multidisciplinary scientific research to assess hydroclimatological dynamics (Garreaud, 2009;Célleri & Feyen, 2009) and to provide knowledge to cope with climate change impacts.
The hypothesis here is that in such a complex environment, the hydrologic properties of the basin, such as water storage, water movement and baseflow characteristics are highly affected by the watershed topography and geomorphology (Price, 2011). Additionally, according to Coopersmith et al., 2012), the landscape framework is shaped directly by the regional climatic conditions, which, also determines the response of stream flows to rainfall intensity (Berhanu et al., 2015).
Among the available hydrological tools to evaluate the response of stream flows to the physical and climatic variables, the evaluation of Flow Duration Curves (FDC) is a widely accepted methodology with recognized practical efficiency (Ilstedt et al., 2016). It is based on graphical and statistical methods to identify the influence of the physical and climatic variables of the basin and how far they describe its hydrological behavior (Smakhtin, 2001;Brown et al., 2005). Water flow impact assessments using the FDC approach, considering forest cover, soil and geology framework, are offered by Best et al. (2003); Brogna et al. (2017);Peña-Arancibia et al. (2019).
Due to the large number of variables that affect the FDC responses, multivariate techniques can be applied precisely to identify the most dominant ones in hydrological processes. In these cases, it is recommended to use cluster analysis and/ or the clusters obtained from the Principal Component Analysis (PCA) (Galbraith et al., 2002;Abdi & Williams, 2010).
The application of PCA techniques has the great advantage since it reduces the number of the variables used to explain the hydrological behavior, by retaining only those principal components that explain the most significant portion of the data variance (Westra et al., 2007). It is a way of complementing the information obtained through the FDC analysis.
The PCA techniques have been used in hydrology: (i) to identify cause-effect relationships; (ii) to detect redundant variables; (iii) to further identify zones with homogeneous climatic or hydrological characteristics, and also; (iv) to perform analysis of morphometric variables in watersheds (Richman, 1981;Westra et al., 2007;Singh et al., 2009;Sharma et al., 2015).
These techniques have been applied in several watersheds located in the mountainous Andes. Crespo et al. (2011) andMasiokas et al. (2019) were able to identify a differentiation between basins based on hydrological series and types of soils and also patterns of spatial variability on runoff using monthly river discharge values respectively. The study of Boscarello et al. (2016) discussed the morphometric and climatological key drivers on hydrological dynamics.
Despite the hydrological importance of the watersheds located in Andean regions, such as the Atrato river basin, few studies had discussed the impacts of land use changes and how the hydrological knowledge could be improved by incorporating them in the overall analysis (Ponette-González et al., 2014).
In order to understand the hydrological controls of the Atrato river flow a key scientific question has been proposed: What are the most influential parameters that control the hydrological response of the sub-basins of the Atrato river and how to determine them? Having this question as a guideline, the objective of the present study is to identify and to characterize the physiographic and morphological variables that control the hydrological responses of the Atrato river sub-basins, based on data from a long term monitoring program .

3/12
This study is pioneering as it represents the first attempt to assess the relationships between hydrological and physiographic variables using statistical multivariate approach for the ARB. The results will provide an understanding of land use and vegetation cover on water availability in the region, therefore, pillars for the preservation and conservation of the ecosystems and for the integrated water management policies.

STUDY AREA
The ARB is located in the northwestern portion of the Colombian territory ( Figure 1A), in the Chocó biogeographic region. It occupies an area of about 35,000 km 2 from its headwaters, located in the western Colombian Andes, to its mouth in the Caribbean Sea ( Figure 1B).
The geological framework of the basin consists of a sedimentary sequence represented mainly by marine deposits, which can reach thicknesses of 10,000 m, deposited between the Lower Eocene and Pliocene, hereafter sediments, over a Cretaceous igneous-sedimentary basement, hereafter denominated basement. This sequence is covered by silt-clay and sandy alluvial deposits with high organic matter content, without cementation and poorly consolidated, hereafter alluvial (Alcárcel & Gómez, 2017).
This complex geological framework can be simplified in 3 subgroups according to the lithological and hydrogeological characteristics of the outcropping sequences ( Figure 1D): A unit where sedimentary rocks prevail (48%), followed by a unit composed of Quaternary deposits (32%), and finally, a unit built by igneous formations (20%).
The soils of the ARB are of mainly medium-fine, medium and coarse-medium textures ( Figure 1E). Forests and natural cover represent 80% of the ARB area ( Figure 1C), followed by agricultural areas with 15.6%. It is noteworthy that only 0.3% of the basin can be considered anthropized, essentially due to urban settlements, mining projects, roads and industrial districts (Instituto Geográfico Agustín Codazzi, 2007).
The discharge at the mouth of the Atrato River is approximately 4,900 m 3 .s -1 , with a specific discharge of 0.141 m 3 .km -2 .s -1 , that is directly related to the average rainfall of 4,600 mm.year-1 with an average temperature of 24.3°C.
The high precipitation rates observed are a result of the orographic effects over the moisture coming from the Pacific Ocean (Guzmán et al., 2014;Poveda Jaramillo et al., 2014), which are then generated due to permanent convective activity in the region throughout the year (Zea et al., 2000), as well as the influence of the "CHOCO jet" presents larger magnitudes (Hoyos et al., 2018). This climatic and orographic effect combination is responsible Hydrological responses in equatorial watersheds indicated by Principal Components Analysis (PCA) -study case in Atrato River Basin (Colombia) for the existence of different types of hydrometeorological cycles, in regard to precipitation and to river discharge as well (Urrea et al., 2019).

MATERIAL AND METHODS
The ARB was divided in 17 sub-basins ( Figure 1), which had been monitored by gauging stations. Discharge records on a daily time scale are practically continuous, with periods ranging from 17 to 46 years of observation and with a percentage of missing data between 0% and 20%. Morphometric parameters and physiographic variables are presented in Table 1. In addition to the fluviometric stations, rainfall data from 17 stations, one for each sub-basin, with daily readings and records ranging from 22 to 71 years and 0 to 16% missing rate, were considered in the analyzes For the sake of data homogenization, a common period of hydrometeorological observations between 1984 and 2017 was selected. The fluviometric and rainfall data were obtained from the Water Resource Information System (SIRH) of the Colombian Institute of Hydrology, Meteorology and Environment (IDEAM) (Instituto de Hidrología, Meteorología y Estudios Ambientales, 2014).
The most important morphometric, physiographic and discharge variables that seemed to build up hydrological responses at the ARB were evaluated. The morphometric variables considered here were: area (A), perimeter (Pe), compactness index or Gravelius index (K) (Eagleson, 1970), basin length (LB) and average basin slope (ABS). A and P values were discarded from subsequent multivariate statistical analyzes due to their redundancy with K. Physiographic variables were defined from the land cover mapping and geological map (Table 1 and Figure 1).  Balbín Betancur et al.

5/12
The FDCs (Flow Duration Curves) were constructed with the flow series expressed as water column on daily time scale (mm. day -1 ), aiming to remove the effect of the basin area and allowing the comparison between the curves, as well as the evaluation of the hydrological responses based on the characteristic discharge values of the basins, (Q 1 , Q 5 , Q 10 , Q 50 , Q 70 , Q 90 and Q 95 ), where the number subscript correspond to the percentage of time where the discharge is exceeded. . The Ratios Q 90 /Q 10 and Q 90 /Q 50 were also calculated since they are considered indicators of maximum and minimum flows, respectively (Smakhtin, 2001).

Statistical analysis
A trend analysis was performed in order to identify changes in the mean, median of the data series (precipitation and discharge), as well as to verify the existence of trends or randomness that could affect the subsequent data analysis (Castro & Carvajal, 2010;Guajardo & Granados, 2010). The Trend Analysis 1.2 Software, developed by Cooperative Research Centre for Catchment Hydrology (CRCCH) of Australia (Chiew & Siriwardena, 2005) has been used for this purpose. Analysis included nonparametric statistical tests for all stations data series: (i) The Mann-Kendall test to assess the existence of trends; (ii) The CUSUM Free Distribution test to evaluate the average jumps; (iii) The Rank-Sum test to evaluate the differences in the median, where levels of significance allowed the identification of the null hypothesis acceptance evidences (H 0 ), defined as no trend, average jump, or median difference, respectively.
Correlation between the characteristic discharge values, morphometric and physiographic variables were calculated in order to verify the dependency of these variables, and if some of the correlation observed could be associated to the hydrological response of the watersheds. Initially, the Kolgomorov-Smirnov test was applied to evaluate whether the data presented a parametric or nonparametric distribution. As the result indicated a parametric distribution for the set of variables, the Pearson correlation test was then used, and the correlations were calculated using the R Heatmaply package (R Core Team, 2018). Heatmaply also deploys a seriation package to build a dendogram structure using a default option OLO (optimal leaf ordering) for optimization purposes.
Due to the great number of variables used in this study, the principal component analysis (PCA) has been applied to reduce the large initial set of correlated (or not) variables into a new orthogonal and uncorrelated set of factors called principal components (PC), that account to explain the variance in the original data set. This new representation has compressed the data and kept only the most important information, turning the data systematization into a more simplified exploratory task (Legendre & Legendre, 2003;Jackson, 1991;Abdi & Williams, 2010).
Original variables were initially normalized in order to remove the effect of measurement units, rendering the data dimensionless, as expressed below: Zi is the standardized variable i, Xi is the original variable i, μ is the mean and σ is the standard deviation of the dataset (Davis, 1973).
As soon as the data has been normalized, the PCA analysis involved the calculation of the correlation matrix R and the determination of the eigenvalues (λ1, λ2,…, λp, where p is the number of original variables), and the corresponding eigenvectors (a1, a2, …, ap), through the matrix equation, where "I" is the identity matrix: The composition of the principal components was performed by the application of matrices of eigenvectors as the factors in a linear combination of standardized variables (Noori et al., 2010).
These principal components are orthogonal to each other and are arranged in a descending order based on the amount of variance explained. Thus, the first principal component explains the higher amount of the total variance from the original variables, while the second one captures the higher proportion of the total variance, which is not encapsulated by the first one and so, further on. The retention of meaningful components is done on the basis of the sum of total explained variance.
Eigenvalues and eigenvectors were extracted from the covariance matrix of the original variables using the STATISTICA® software. The components that explained the variance of the data set were identified and ranked hierarchically into groups with the same dominant component. The correlations obtained from the PCA analysis were compared and the variables holding the greatest influence on the hydrological behavior of the different basins were finally defined.

Trend and statistical analysis
No significant trends were identified for the precipitation series in 71.4% of the evaluated rain gauging stations. In the remaining stations, the statistical analysis indicated the existence of two trends in the average precipitation magnitudes: (i) An increasing trend comprised of three stations; (ii) A decrease trend in the other two stations showing different significance levels (with 3.1% of cases rejecting H 0 with a significance for α = 0.10, for α = 0.05 in 8.3% of cases and for α = 0.01 in 17.2% of cases). Statistical test of changes in averages indicated that in 43% of the rain gauges stations no changes have been observed since and passed the H_0 of the CUSUM free distribution test. On the other hand, in 23% of the stations the H_0 test could be rejected with a significance level of α = 0.10, indicating slightly evidences to accept or reject H_0; 14% and 20% of stations, respectively, rejected H_0 with α = 0.05 and α = 0.01, indicating that these stations had a nonparametric behavior, corroborated by the existence of trends.
In 59% of the river gauging series there was no observed trend ; the remaining stations (41%) showed positive trends (indicated by an increase in the average flow magnitudes); 12% revealed statistical significance with little evidence of trends (α = 0.10); 17% had average evidence of trend (α = 0.05); and 12% showed strong indications for tendency (α = 0.01) in the series, specifically the El Siete and Negua stations ( Table 2).
Hydrological responses in equatorial watersheds indicated by Principal Components Analysis (PCA) -study case in Atrato River Basin (Colombia) It was also shown that 59% of the discharge series did not present evidences of changes; while 24% of the stations revealed slight evidence of average changes for significance levels α = 0.10; and 18% presented changes for significance levels for α = 0.05 ( Table 2).
Rain and river gauging stations with trends of changes are situated in the same region, where nonparametric precipitation behaviors were detected. Nevertheless, there are also some river stations where discharges had increased regardless of precipitation patterns. This behavior cannot be directly attributed to climate change and there are errors associated to the monitoring program and analysis.
Characteristic discharge values for the study sub-basins and hydrological indicators were estimated from the FDC's (Table 3). Large variation in the magnitude of the characteristic discharges were observed. Variations reached more than 10 times to Q1 and about 5-6 times for the other characteristic discharges. Despite the similarity observed in slopes and trends for most of the FDC's curves, with exception of sub-basins 9 and 13 (single group with a high slope in discharge values, blue line in Figure 2), the other basins can be grouped in two groups based on their characteristic discharge values. The first group, representing the headwater basins, presented lower values for the characteristic discharge values (brown lines in Figure 2), indicating a low water release associated to the precipitation events. The second group, located at the lower part of the river, is characterized by higher characteristic discharges values (green lines in Figure 2); their flow rates are higher and more stable over time, which could be related to slow responses to the precipitation events and a constant pattern of aquifer contribution.
The hydrological and physiographic variables can be divided in two groups, according the dendogram presented in Figure 3. The first group (green part of the dendogram) encompasses the following variables (Al) alluvium coverages, (S) sedimentary rocks; (B) basement, average and fine texture soil cover (AT and FT), crops (C) and average basin slope (ABS). The second group of variables (red part of the dendogram) encompasses the morphometric variables (K index and length of the basin (LB), mean annual precipitation (P), coarse and other soil textures (CT and OT), forest and other soil coverage (F and OC) and the characteristic discharge values, which were assembled in two sub-groups: (i) lower discharge values (Q50, Q70, Q90 and Q95), associated to other coverages (OC) and K index; (ii) higher discharge values (Q10, Q5 and Q1).
The Pearson correlation between precipitation and characteristic discharge values can be assumed to be valid, because no significant trends in precipitation and discharge series were observed, and the values indicated a positive correlation, ranging from 0.70 to 0.93 between average precipitation (P) and characteristics discharge (Q95-1). A higher correlation was  Figure 1, and details about groups division on the main text. observed with Q50 (r=0.93), with tendency to decrease towards greater and lower discharges. Some correlations between characteristic discharge values and physiographic variables were also identified ( Figure 3). The Q95-50 presented high correlation with the K index and LB, respectively about 0.78 and 0.65. On the other hand, no correlation was observed with Q1-10. Regarding ABS a negative correlation was observed, increasing to higher characteristics discharge.
Land use patterns and soil textures seem to represent an important control factor on river discharges: (i) Forest and other soil coverage areas resulted in a positive correlation, decreasing towards higher discharge (0.75<r<0.39 and 0.88<r<0.36, respectively); (ii) Inverse correlation with crop areas, increasing towards higher characteristic discharge values (-0.39<r<-0.75): (iii) Areas covered by fine textures soils do not show strong correlation with the characteristic discharge values, increasing towards higher characteristic discharge (-0.20<r<-0.45); (iv) The soil coarse textures rendered a strong correlation with lower characteristic discharge values, indicating the storage role played by this soil and the discharges during low rainfall periods; (v) No important correlation with geological framework has been observed ( Figure 3).

Principal Components Analysis (PCA)
The PCA showed that the first three components were responsible by explaining 85.18% of the total variance of the data ( Table 4). The principal component 1 (PC1), which accounted for 51.35% of total variance, had the largest influence on the characteristic discharge values (P, Q 95 , Q 90 , Q 70 , Q 50 and Q 10 ), followed by the K index, associated with forest (F),other soil textures (OT) and soil coarse textures (CT). These variables were grouped in the red portion of the dendrogram (Figure 3).
The principal component 2 (PC2) explained 18,84% of the total variance. The variables presenting higher loads were those related to the geological features (areas covered by sedimentary rocks and alluvial deposits), fine and average texture soil and the average basin slope (ABS), (green portion of the dendrogram at the Figure 3). Finally, the principal component 3 (PC3) explained 14.99% of the total variance and showed the highest weights associated with the maximum characteristic discharge values (Q 5 and Q 1 ), basement (B) and length of the basin (LB), (red portion of the dendrogram except for basement at the Figure 3).
The variables associated with PC1 were considered the most determinant for the flow control in the sub-basins presenting maximum specific discharge along the historical series. Discharge, in this case, is due to land cover occupied by forested areas and coarse soil textures. This factor is associated to the sub-basins belonging to the Group II of FDCs (green lines at Figure 2 and green triangles at Figure 4A). The PC1 showed a positive load of eigenvalues for the magnitudes of the specific discharges and the precipitation (Table 4).  On other hand, the PC2 was interpreted as responding for the discharge processes and has hosted the variables related to the geological settings ( Figure 4B). Most of the sub-basin belonging to Group II are characterized by a higher percentage of alluvial deposits, except sub-basins 12 and 16 which presented remarkable areas of basement rocks, with a hydrological response similar to observed in the sub-basins belonging to the Group I. As a way of verifying the relationship between geological variables and minimal discharges, some authors have suggest detailing the geological discretization of the study area (Carlier et al., 2018).
Sub-basins 14, 6 and 7, despite the fact of being located in the upper part of the ARB and presenting an opposite hydrological response, seemed to have flows controlled by the geological framework. The predominance of alluvial areas in these sub-basins could attenuate the hydrological responses and make these sub-basins to behave similar to the ones of group II. Finally, similar to PC1, the PC3 was associated with sub-basins in which the runoff is dominant ( Figure 4C).
The flow magnitudes presented in the permanence curves ( Figure 2) may also be directly related to the principal components. The sub-basins of the Group I showed negative PC1 and PC3 values, indicating lower values for the specific discharge. They are constituted by higher percentage of agricultural land and sedimentary deposits with higher ABS. Conversely, the Group II sub-basins presented positive PC1 and PC3 values, indicating higher specific discharge. They present higher percentage of forest areas and alluvial deposits and do have lower ABS and greater magnitude in precipitation, compared to the previous ones. The PC1 maximum loads of variables were associated to the characteristic discharge values and to the forest cover. On other hand, the areas dominated by agricultural land cover seemed to favor the runoff generation. According to Le Maitre et al. (1999), agricultural areas promote changes in the macro and microstructure of soils preventing water infiltration and other processes.
The relationships between fine, average textures and the ABS, responsible for building up higher negative loads for the PC2, lead to slow discharges, as previously stated by Brown et al. (2013). These authors observed small changes in lower discharges in watershed dominated by fine soils and lower slopes in contrast to watersheds dominated by coarser soils and higher slopes. On the other hand, areas bearing high hydraulic conductivities coincident with areas presenting lower to average slopes seemed to facilitate  Balbín Betancur et al.

9/12
water infiltration and storage, increasing the permanence of minimum flows and base flow (Smakhtin, 2001). In PC3, the amount of surface runoff is evidenced as a determinant in the composition of maximum flows in areas with predominance of basement rocks. In comparison with areas dominated by other lithologies, related to PC2, the basement rocks tend to decrease the soil water infiltration capacity, facilitating the surface runoff and delivering higher magnitudes of maximum flow rates. The sub-basin specific discharge can be analyzed through the different types of FDC's ( Figure 2). The sub-basins with type II curves that are predominantly covered with forest seemed to present a high production of water leading to a more regulated discharge pattern. This is due to the control exerted by interception and evapotranspiration, which tends to homogenize the rain arrival in the soil and subsequent infiltration and water release. The sub-basins, which are characterized by the type I FDC's, situated in higher slopes areas, are covered by agricultural activities and do have the lowest specific discharge. In agricultural areas, due to the lack of vegetation the river flow responses to the precipitation events are faster and therefore its specific performance is lower (Brown et al., 2013).

CONCLUSIONS
In the studied area, the specific discharge variations are mainly controlled by land cover (forest and agriculture), the presence of coarse soils, and, to a lesser extent, by the geological framework (alluvial deposits versus sedimentary rocks). Areas covered by forest associated to coarse soil presented the higher capacity to regulate low flow. On the other hand, sub-basins with intense agricultural activities presented higher instantaneous discharge, due to higher runoff as a combined effect of evapotranspiration rate reduction and the roughness of the terrain.
Despite the low correlation, the geological setting is an important discharge control factor, as evidenced by the grouping of variables from PC2. However, to corroborate this relationship, a more detailed geological assessment is required. The geological framework of the ARB also plays an important role on discharge. The sub-basins situated on more permeable rocks showed higher specific discharge. This is an important aspect to be considered when it comes to water management practices and policies under climate changes scenarios.
Despite the concordant results between FDCs and PCA analysis, the FDCs analysis must be understood as a preliminary assessment of the key factors related to the hydrological responses. Complex areas, such as the ARB, require caution in the application of FDCs, since some simplifications could affect subsequent analyzes.
The statistical support given by the PCA analysis has minimized biases in the identification of such factors. The methodology used within this study proved to be a robust tool for understanding the hydrological response and water performance of complex and heterogeneous regions.