ABSTRACT
Completing missing hydroclimatological data is crucial for effective water resource management, especially in remote regions such as Colombia's Orinoquía. This study proposes a method for imputing monthly streamflow data from 1985 to 2020 by applying Nonlinear Principal Component Analysis (NLPCA) combined with CHIRPS precipitation data. Sensitivity analyses revealed that including precipitation as an exogenous variable significantly enhances imputation accuracy. The method was applied to four hydrometric stations in the Cravo Sur River basin, with missing data rates ranging from 5% to 63%. The performance of the imputed data was evaluated using Spearman correlation coefficients (0.95-0.99), BIAS (0.2% to -2.1%), and root mean square error (RMSE) values (0.79 to 1.93 m3/s). These results demonstrate the method's robustness and adaptability across varied hydroclimatic conditions within the Orinoquía region.
Keywords:
NLPCA; CHIRPS; Cravo Sur; filling in missing data; hydrometric gauges
RESUMO
O preenchimento preciso dos dados hidroclimatológicos ausentes é crucial para a gestão eficaz dos recursos hídricos, especialmente em regiões remotas como a Orinoquía, na Colômbia. Este estudo propõe um método para imputar dados mensais de fluxo de água de 1985 a 2020, aplicando a Análise de Componentes Principais Não Lineares (ACPNL) combinada com dados de precipitação do CHIRPS. Foram realizadas análises de sensibilidade que revelaram que a inclusão da precipitação como uma variável exógena aumenta significativamente a precisão da imputação. O método foi aplicado em quatro estações hidrométricas na bacia do rio Cravo Sur, com taxas de dados ausentes variando de 5% a 63%. O desempenho dos dados imputados foi avaliado usando coeficientes de correlação de Spearman (0,95-0,99), BIAS (0,2% a -2,1%) e valores de erro quadrático médio (RMSE) (0,79 a 1,93 m3/s). Esses resultados demonstram a robustez e a adaptabilidade do método em condições hidroclimáticas variadas na região da Orinoquía.
Palavras-chave:
ACPNL; CHIRPS; Cravo Sur; preenchimento de falhas dos dados ausentes; estações hidrométricas
INTRODUCTION
Hydrology is a natural science that analyzes the distribution, behaviour and physical properties of water, as well as its close relationship with the natural environment and the living beings that inhabit it. The study of the hydrological cycle can be simplified to the interpretation of data and time series (Ly et al., 2013). For this reason, it is necessary to have quality and quantity data that represent the hydroclimatological data of interest.
Usually, hydroclimatological data have errors in measuring instruments, lack of maintenance, human error, lack of access, and scarce resources to operate the stations, among others (Castro-Heredia et al., 2012; Hamzah et al., 2020). The problem of missing hydroclimatological data afflicts various regions of the world, which is why several research studies have been developed that seek to find methods of imputation of missing information that are as close to reality as possible. For example, Hamzah et al. (2021) propose an approach to estimate missing flow using data-driven methods and data fusion techniques; the authors combine data from various sources, including satellites, flow sensors, and hydrometric stations, to improve the accuracy of the estimates. This approach can be employed when multiple data sources are available for the hydrologic variable.
In turn, the work of Duarte et al. (2022) compares several methods to complete missing data in hydrological time series; the authors analyze methods based on statistical models, machine learning and hybrid methods and evaluate their performance in terms of accuracy and efficiency. This study provides an overview of the different techniques that can be used to fill in missing data and can help to select the most appropriate method according to the characteristics of the data and the objectives of the study. On the other hand, in the research carried out by Canchala-Nastar et al. (2020), the variability of the flow rates of two rivers of the Colombian Pacific was studied, and the data were completed through Non-Linear Principal Component Analysis (NLPCA) with good approximations in the treatment data using only the variability of the flow data.
Similarly, in the analysis of Jiang et al. (2020), a method for filling missing flow rate data in water distribution systems using machine learning is presented; the authors use artificial neural networks to predict missing flow rate values and demonstrate that their approach can improve the accuracy of flow rate estimates. This approach can be helpful when a large amount of flow data is available, and high estimation accuracy is required. In turn, in the study of Umar & Gray (2023), a method for filling the missing flow data using multiple imputation and generalized additive modelling was done, and the authors demonstrated that this approach can significantly improve the accuracy of flow estimates compared to other methods. This approach may be relevant when auxiliary information is available to impute missing flow data.
The amount of hydroclimatological data differs in quantity and quality in different regions. In Colombia, for example, the Andean region has 1381 active stations covering 19% of the area of the national territory, representing 59% of the stations, a situation equivalent to 1 station every 160 km2. This contrasted with a region such as the Amazon-Orinoquía, which has 464 stations distributed over 658,000 km2, representing 1 station every 1400 km2 of the territory. This is mainly because 68% of the population is concentrated in the Andean region, representing 90.7% of the GDP. Paradoxically, in the Colombian territory, there is an unequal spatial distribution of hydroclimatological stations in the areas with the highest water production; this is the case of the Amazon-Orinoquía region (1 station/1400 km2) and the Pacific region (1 station/1442 km2), which are the regions with the fewer temporal records available.
In the Colombian Orinoco region, the scarcity of meteorological and hydrological stations poses a significant challenge to the study of water supply in the hydrographic basins. The limited number of existing stations has resulted in substantial gaps in data, leading to a dearth of accurate and reliable information. This, in turn, severely impedes hydrological modelling and decision-making in water resource planning, risk management, and flow estimation. To address these limitations, this study utilizes a time frame from 1985 to 2020, covering 35 years of monthly flow records in the Cravo Sur River basin.
This research developed a technique based on the above that imputes missing flow data from the Cravo Sur River basin. This basin has great socioeconomic importance in the Colombian Orinoquía due to the significant economic development around rice crops, extensive cattle ranching, and the high population (ca. 250,000 people) living close to the basin (Colombia, 2018). The above implies the need to have an essential degree of knowledge of the flows of the Cravo Sur River in the framework of proper water management in the basin, especially in times of low water levels where the river's water catchment is regulated mainly considering the crops with high water demand, such as rice. The Cravo Sur is a large river with an average monthly flow of up to 255 m3/s and reported maximum flows of up to 693 m3/s near the mouth of the Meta River, according to data from the La Estación station operated by the Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM).
The methodology proposed in this study has the potential to be replicable in basins of the Colombian Orinoco and Amazon basins. In addition, it could be evaluated in other regions with similar characteristics in South America, being an alternative in contexts with scarce information. Its low cost and the need to have tools and resources to improve the quality and quantity of information to support the decision-making would allow obtaining information for the adequate planning of water resources, as well as to prevent risks associated with water and guarantee its sustainable use.
METHODOLOGY
Study area
The Cravo Sur River has its sources in the steep and unique relief of the Eastern Cordillera of the Andes at an altitude of 3 800 m.a.s.l. between the Serranía de Peña Negra and the Páramo de Cadilla (Figure 1). Geographically, the basin is located between 4°41’13” and 5°56’37” North Latitude (N) and 71°34’09” and 72°46’28” West Longitude (W), with a total area of 5,651.13 km2 and an oval-oblong to rectangular-oblong shape (Colombia, 2018, 2023).
After flowing a distance of 205 km from its source, the Cravo Sur River joins the Meta River (Figure 1). The Meta River is the tributary of the Orinoco River (Instituto de Hidrologia, Meteorologia e Estudos Ambientais, 2013). The main tributaries of the Cravo Sur, the Tocaría and La Niata rivers, are also monitored at the El Playón and Puente Carretera stations, respectively. No station monitors the inflow to other tributaries like the Siamá, Ogontá, Negro, and Chiquito rivers. Further details regarding these sub-basins can be found in Corporinoquia & Corpoboyacá (Colombia, 2018).
This study focuses on four sub-basins—Cravo Sur Alto, Tocaría, La Niata, and Cravo Sur—which are of interest for their hydrological context (see Figure 1).
The Tocaría River sub-basin has its source in the Guevarrica hill with heights of 3,200 m.a.s.l., and a geomorphology predominantly shaped by the interaction of geological formations like the Las Juntas sandstones and Macanal shales. These shapes directly influence the infiltration ability and the behaviour of surface runoff and are relevant to local hydrodynamics, determining different erosion, sedimentation and water retention processes (Colombia, 2015a). The sub-basin is mainly occupied by agricultural land with extensive rice and short-crop cultivation (Figure 2), which places ever greater pressure on natural resources. The extension of agricultural land made possible through water availability has contributed to a net reduction of forest area by 30% during the last 20 years, making the soil more susceptible to erosion and eliminating biodiversity (Instituto Geográfico Agustín Codazzi, 2021; Colombia, 2010). The degradation has directly affected the hydrological regulation and ecological functions of the watershed (Colombia, 2015a).
Sustainable land use at the Cravo Alto sub-basin, in the upper part of the Cravo Sur basin, is also extremely challenging at this site. Composed of mountainous regions and intermediate valleys (see Figure 2), this area also maintains agricultural crops including rice and maize. Nonetheless, this increase in agricultural use has exacerbated soil loss and decreased soil water holding capacity, altering the hydrological balance (Colombia, 2015b). Increased intensity of land demand and changes in runoff patterns have made more water bodies vulnerable to sedimentation.
The La Niata sub-basin has steep slopes, making it prone to fast runoff, which increases its susceptibility to the effects of erosion, and that of flooding, especially in the rainy seasons with heavy rainfall. This sub-basin have become highly homogenized over the last 15 years (Instituto Geográfico Agustín Codazzi, 2020), with a loss of 40% in natural vegetation cover. The loss of this vegetation has exacerbated the effects of global warming — the soil holds water worse, and the potential for hazards such as landslides and debris flows has risen.
The lower Cravo Sur is the largest sub-basin in our study area, encompasses nearly flat areas surrounding the Meta River (see Figure 2), and is a biodiversity hotspot. Yet, huge agricultural land and livestock expansion make high-level sedimentation of water bodies feel worse, bad water quality, and affect aquatic ecosystems. Unsustainable agricultural practices have further aggravated such pressures, increasing slope erosion, particularly during extreme rainfall events (Colombia, 2015b).
Data
Observed data
The flow data were obtained from 4 stations in Table 1 of IDEAM (see location in Figure 1). The stations estimate variable flows ranging from 8 m3/s of La Niata to average flows of 255 m3/s of the Cravo Sur River (see Table 1). The amount of missing data is considerable in most stations, reaching values of up to 63% of missing data, with a coefficient of variation of up to 86%. The common period of record is 35 years, from 1985 to 2020.
Statistical summary and results of trend and homogeneity tests for hydrometeorological stations managed by IDEAM in the Cravo Sur Basin.
The precipitation data were obtained from 4 stations listed in Table 1 (see locations in Figure 1). These stations record varying levels of precipitation, with averages ranging from 200 mm at Molinos de Casanare to 306 mm at El Morro (see Table 1). The amount of missing data is significant in most stations, reaching up to 37% missing data, with a coefficient of variation as high as 75%.
The Phillips-Perron (PP) stationarity test indicates that the flow series at the Puente Yopal and La Estación stations exhibit non-stationarity. This test assesses the null hypothesis of a unit root, implying non-stationarity, against the alternative hypothesis of stationarity. Consequently, statistical properties such as the mean and variance of flow at these locations display temporal variability. This non-stationarity may be attributable to significant anthropogenic and environmental factors affecting the Cravo Sur Alto basin (monitored by the Puente Yopal station) and, more broadly, to landscape and land-use transformations across the entire basin, including the lowland areas monitored by La Estación station.
Among the eight stations analyzed, only the El Yopal flow station exhibits a statistically significant shift in the median of its time series, with a likely change point detected in April 2001, as indicated by its exceptionally low p-value (refer to Table 1). The other stations display high p-values, suggesting that the null hypothesis of homogeneity within their time series cannot be rejected, thus indicating temporal stability in the records of these stations.
Also, the same station (Puente Yopal) exhibits a significant upward trend in its time series, with a Sen's slope of 0.1343 and a p-value of 0.00005 in the Mann-Kendall test (see Table 1), indicating a statistically significant trend (p < 0.05). The precipitation stations La Chaparrera, Molinos de Casanare, and El Morro, as well as the flow stations La Estación and El Playón, display positive Sen's slopes, suggesting slight upward trends; however, their p-values exceeding 0.05 indicate that these trends are not statistically significant. Conversely, the Puente Carretera flow station shows no trend, with a Sen's slope of 0.0000 and a p-value of 0.826, confirming the absence of significant change in the series. Overall, the results suggest stability in the time series of the stations, except Puente Yopal, where a distinct and significant trend is observed.
Remote data
Precipitation data from the Climate Hazards Group InfraRed Precipitation (CHIRPS) was used, which has data from 1981 to the present with a spatial extension throughout Colombia. The United States Geological Survey USGS and the University of California, Santa Barbara, created and managed the database with data only in land areas between latitudes 50° S and 50° N and in all longitudes with a resolution of 0.05°. Temporal resolutions are days, pentads, months, decades, and years (Funk et al., 2014). The CHIRPS database was used considering the previous validation performed for Colombia by Urrea et al. (2016), Funk et al. (2015), and Ocampo-Marulanda et al. (2022). As noted in the previous chapter, within an area of 5,651.13 km2, there are only three stations, and they are not evenly distributed spatially (see Figure 1). Furthermore, two of these stations have more than 20% missing data (see Table 1). This underscores the need to utilize alternative data sources, such as CHIRPS.
Proposed imputation method
Figure 3 presents a schematic overview of the research methodology, outlining two primary pathways: one where the NLPCA data imputation method (explained below) is applied solely to streamflow data and another where both streamflow and precipitation data are integrated to enhance the accuracy of flow data treatment. The methodology concludes with a comparative evaluation of imputation performance using three distinct metrics.
Methodology for imputation of missing flow rate data without exogenous variables
The NLPCA methodology, known for its precision and supported by artificial intelligence, was employed. This technique, defined by Scholz et al. (2005, 2008), was used to complete hydroclimatological missing data by Miró et al. (2017), Canchala-Nastar et al. (2020), and Castillo-Gómez et al. (2023). In this methodology, the dimensionality of precipitation and flow data is reduced to Nonlinear Principal Components (NLPC), which capture the main modes of variability. The neural network was trained in an encoding process, allowing for obtaining fewer components. Subsequently, the original data matrix is obtained again by decoding the components using the inverse NLPCA technique. This process results in a reconstructed matrix, including estimating existing and missing data.
The architecture of the inverse NLPCA is of hourglass type with the architecture [a-b-c], where a are the extracted nonlinear components, b are the nonlinear hidden components, and c are the approximate features. These c features are compared with the input time series, and the neural network is trained up to n iterations until the mean square error (MSE) is minimized. The schematization of the architecture and the NLPCA and inverse NLPCA method can be seen in Figure 4. Additional information on the technique can be expanded in Canchala-Nastar et al. (2019).
Schematic diagram of the NLPCA method for the completion of missing streamflow data. Adapted from Canchala-Nastar et al. (2020).
The abovementioned method was evaluated using the four monthly flow time series of the Puente Yopal, Puente Carretera, El Playón, and La Estación stations. The initial matrix to be reconstructed by the NLPCA method was 432 × 4, corresponding to 432 months of information (1985-2020) and the four flow stations.
Methodology for imputation of missing flow data using exogenous variables
Monthly precipitation and flow data were combined and tested to complete the missing data for the latter variable. Previously, a Spearman correlation analysis was performed between flow and precipitation data to validate the potential use of this exogenous variable as a descriptor of the main mode of flow variability. Other authors have previously used Principal Component Analysis to describe the main modes of variability of the hydroclimatology of territory with good results (Cerón et al., 2020; Ocampo-Marulanda et al., 2021).
First, the hydrological sub-basins were divided, and data matrices were constructed with flow and precipitation variables. The four selected hydrometric stations were taken as the closing point of the sub-basin they represent (van Oel et al., 2011). The division of the sub-basins was based on the Digital Elevation Terrain Model (DEM) of the Cravo Sur River basin, extracted from ASF Data Search, where raster mosaics supported in QGis were made. The download was performed from the Sentinel 2 sensor with a resolution of 10 m space dated September and October 2019, made available by the United States Geological Survey (USGS). The watershed shape provided by IDEAM was extracted from the CHIRPS raster mosaic as part of the zoning and coding of watersheds in Colombia (Instituto de Hidrologia, Meteorologia e Estudos Ambientais, 2013). We filled the imperfections of the DEM with the Fill tool, then determined the flow direction in each pixel based on the height difference with the FlowDirection tool and subsequently counted the number of pixels accumulated upstream of each pixel using the FlowAccumulation tool. With this process, it was possible to define the afferent area (number of accumulated pixels) of each closure point of interest, corresponding to the location of the hydrometric stations indicated in Table 1.
Once the sub-basins were delineated, CHIRPS precipitation data were extracted per each sub-basin. Flow time series, were grouped by sub-basin as shown in Figure 5, and then combined with precipitation time series from CHIRPS to create a matrix form as can be seen in Equation 1.
where is the data of each sub-area; P is precipitation; Q is flow; n is the CHIRPS pixel number; and m is the monthly precipitation or flow data.
The schematization of the hydrological logic in the data organization to complete missing data. Q is streamflow data.
The data matrix is organized so that all the variables coincide temporally in the columns. The rows vary the CHIRPS precipitation data, and the flow time series are reconstructed at the end of the matrix.
Validation of the monthly flow missing data imputation
Once the flow time series were reconstructed, they were compared with the actual time series estimated by the in situ hydrological stations using three performance metrics and graphical comparisons. The performance metrics used were Spearman's Correlation Coefficient (CC), Percent Bias (PBias), and Mean Square Error (MSE). The corresponding equation for calculating these performance metrics can be seen in Table 2.
As a complement to these metrics, graphical validation was also performed. These comparisons included side-by-side representation of both the observed and predicted time series, providing a quick visual check of how well the reconstructed series fit the real data in time. These visual comparisons also provide a sense of periods when the model may peak and underperform or diverge from the observed values.
In addition, the cumulative flow sum plot and residual plot for each station were streamed. Because these plots visualize the potential long-term impact of the treatment data, including any systematic over- or underestimation over the entire study period, Residual plots allow you to see all patterns, trends, and changes in variance over time —here, all the differences between observed and estimated values. These graphical analyses, combined with the quantitative metrics, provided a visual verification guide that allowed for both numerical and temporal verification of the solution; a general approach to validation that may be more widely applicable.
RESULTS AND DISCUSSION
Spatio-temporal characterization of precipitation and flow in the basin
Climate: Average annual rainfall is 2,172 mm in the Cravo Sur basin, with lower precipitation amounts recorded between December and March (dry season) and greater precipitation amounts between April and November (wet season) (Ruíz-Ochoa et al., 2022). The highest average monthly flows correspond with the wet season, clearly seen in Figures 6 and 7, and the low flows with the dry season, showing unimodal hydroclimatological behaviour throughout the dry and wet seasons. The average discharge located in La Estación station is 245 m3/s, February (the driest month) has a multiannual monthly discharge of 34 m3/s, while, in July, the highest discharge is 482 m3/s (although these flow fluctuations also happen within sub-areas, we generally find the same flow distribution every year with a maximum in April, May, and June and a minimum in January and February).
The isohyets of CHIRPS allow for spatial rainfall distribution (Figure 6) over this basin, indicated by the Cravo Sur basin. The highest yearly precipitation concentrations are found in the central part of the basin, where the orographic transition occurs as the Andes slip into the Orinoco Plains. This region reaches about 2,400 mm/year, compared to the upper basin, with annual values of about 1,000 mm/year. Also, the lowland areas of the basin receive significant precipitation, with annual amounts approaching ~2,000 mm/year.
In the Cravo Sur Basin, the regime of the precipitation and streamflow is unimodal; in this region, the regime is mainly influenced by the Intertropical Convergence Zone for its latitudinal movement, the modes of oceanic–atmospheric variability, such as El Niño–Southern Oscillation (ENSO) (Builes‐Jaramillo et al., 2022), and the interactions with local, orography, and convective processes that exert significant control over the spatial-temporal hydroclimatology (Paredes-Trejo et al., 2023). Furthermore, over this area was found the Orinoco Low-Level Jet (OLLJ), one of the mechanisms involved in moisture transport and associated with the annual cycle of precipitation due to strong OLLJ winds coinciding with the dry season (DJF), whereas weak jet winds occur during the wet season (JJA) (Builes‐Jaramillo et al., 2022).
Filling in missing flow rate data without exogenous variables
Missing data from four hydrologic stations between 1985 and 2020 were completed using the NLPCA described above. Reconstructing the flow data was challenging, considering missing data percentages of up to 63% and data variability ranging from 49% to 71% (see Table 1).
The completion of missing data using only the flow information is shown in Figure 8. Graphically, an estimation of flow information with good performance is observed at El Playón and La Estación stations, with the lowest percentage of missing data. On the contrary, erratic information reconstruction is observed at Puente Yopal and Puente Carretera stations, mainly when long periods are without flow information. It can be observed that the reconstruction of information shows negative flow values for the four stations evaluated, which does not make physical sense considering that the evaluated variable is of positive magnitude.
Graphical representation of missing flow rate data completion without using exogenous variables for the stations Puente Yopal, Puente Carretera, El Playón and La Estación in the Cravo Sur basin.
Figure 8 shows that the missing data are continuous periods without information. Usually, it is a period where the station suffered some damage, and no information is reported for several consecutive months. The most critical example is the Puente Carretera station, which has 180 consecutive months without information, followed by La Estación station, which has 75 consecutive months without information.
Filling in missing flow data with exogenous variables
Precipitation from the CHIRPS database was used as an exogenous variable to improve missing data imputation. The hydrological sense of water direction justifies this grouping in each sub-area and the hydrological response of precipitation on the flow of the Cravo Sur, La Niata and Tocaría rivers. This relationship was confirmed by a statistical correlation between the flow time series and the time series recorded by a random CHIRPS pixel in the sub-area. This statistical correlation can be seen in Figure 9. A correlation of 0.80 can be observed between the El Playón flow and a precipitation time series in the centroid of the Tocaría River sub-area. Similar results are observed in the other sub-areas; the correlations are shown in Figure 9.
Correlation maps between monthly flow and precipitation for each sub-basin of Cravo Sur basin.
The stations selected in Table 1 were defined as the closure point of the sub-areas. The Puente Yopal station is the closure point of the mountainous part of the Cravo Sur River; the Puente Carretera station is the closure point of the La Niata River; the El Playón station is the closure point of the Tocaría River, and the La Estación station is the closure point of the entire Cravo Sur River basin as shown in Figure 10.
Spatialization of the sub-basins constructed for the completion of missing data in the Cravo Sur basin.
The neural network architecture corresponds to the structure [n, n-1, n], where n is the number of time series comprising the previously constructed matrices. The architecture of the Puente Yopal station was [36, 35, 36], Puente Carretera station [6, 5, 6], El Playón station [47, 46, 47] and La Estación station [168, 167, 168].
The flow data reconstructed using the neural network modulates the behavior of the real flow data, as shown graphically in Figure 11. The reconstruction of the data emulates the flow behavior in the average months and even in the extreme events of maximum and minimum flows. The reconstructed values did not exceed the maximum and minimum flow values recorded by the stations.
Graphical representation of the completion of missing flow rate data using exogenous variables for the stations Puente Yopal, Puente Carretera, El Playón and La Estación of Cravo Sur basin.
Performance of the reconstruction of missing flow data
Table 3 compares the performance of the NLPCA method without exogenous variables and using precipitation as exogenous variables. A good performance is observed in both methods, but it is notably better in the NLPCA method with exogenous variables.
The performance metrics show the optimal performance of the missing data completion method using precipitation as an exogenous variable. All stations show a Spearman correlation coefficient close to 1.0; the lowest is at Puente Carretera station, with a value of 0.95. It shows a direct relationship between the variabilities of both data series evaluated for the flow estimated by the method and the real flow. Regarding the magnitude of the reconstructed variable, a low PBias percentage was shown with a slight underestimation at the Puente Carretera station of -2.1%. In contrast, at the other stations, the PBias value was close to 0. Similarly, a low mean square error is shown for all stations, the highest value being 1.93 m3/s at Puente Carretera station. This value is low considering that the standard deviation for Puente Carretera station is 6.9 m3/s.
Figure 12 compares cumulative observed and estimated streamflow at Puente Carretera, Puente Yopal, Puente Playón, and La Estación. For each station, the figure shows the cumulative series for both observed (solid blue line) and streamflow values (estimated dashed red line), yielding a visual representation of the accuracy of the applied imputation method over the complete study period (1985–2020). This comparison emphasizes the ability of the method to reproduce general trends as well as the extent of variability in the accumulation of streamflow at the level of individual sub-basins. Although most stations show estimated values that closely track observation data, some divergences are observed. At the Puente Carretera station, a stronger graphical divergence is noted, and some discrepancies between observed and estimated values appear for the El Playón station during the period 2000–2010. These differences can be attributed either to local hydrological response variations or to some limitations of the estimation method for specific conditions. In sum, figure highlights how well the proposed method reproduces total streamflow in data-poor regions and identifies potential areas of improvement.
Comparison of cumulative observed and estimated streamflow without exogenous variables at selected stations in the Cravo Sur basin.
A residual analysis of the observed vs. estimated streamflow of four monitoring stations in the Cravo Sur basin is shown in Figure 13. The Puente Carretera station residuals are slightly positive, indicating a slight overestimation. The consistent proximity of the residuals to zero across the stations highlights the strength of the method in obtaining an accurate representation of streamflow trends. Moreover, the plot visualisation provides constant variance in residuals across all stations, which means that the model does not seem affected by the specific time period and still provides a reliable forecast. The lack of any evident patterns or systematic biases in the residuals further indicates the model's ability to adapt to different hydrological situations within the Cravo Sur basin. In conclusion, this residual analysis supports the validation of the imputation method, which is capable of providing estimates that are consistent with observed values, even in a region with complex hydrological variability.
Residual analysis of observed and estimated streamflow without exogenous variables at selected stations in the Cravo Sur basin.
Figure 14 and Figure 15 illustrate the accuracy and effectiveness of the imputation method used in this study. In the cumulative streamflow comparison, the observed and simulated time series align almost perfectly, indicating a high level of accuracy in replicating the streamflow patterns across all stations. This is further supported by the residual analysis, where the residuals are shown to be of very low magnitude, suggesting minimal discrepancies between observed and estimated values. The consistency of these residuals around zero across all stations highlights the model’s robustness and its ability to maintain stable performance throughout the time series.
Comparison of cumulative observed and estimated Streamflow with exogenous variables at selected stations in the Cravo Sur basin.
Residual analysis of observed and estimated streamflow without exogenous variables at selected stations in the Cravo Sur basin.
Table 4 displays NLPCA results, where we can see Component 1 is accounting for most of the variance for all of the stations, suggesting it contains the most significant underlying “features” in the data (i.e. explaining variance). For example, in the analysis 'Without Exogenous Variables', Component 1 has an eigenvalue of 2.62 and accounted for 87.36% of the variance of the data, indicating its focus on the main variability of the streamflow series. Component 2 explains a smaller proportion of the variance, with more hints of the same order — in this case, seasonal or local trends that are not covered by Component 1.
The stations with higher eigenvalues and variance-explained percentages, such as La Estación and Puente Yopal, showed the best rates of data imputation. The large variance explained by Component 1 at these stations suggests that a single component captures the dominant trends in these datasets, indicating that NLPCA can provide good imputation at these stations. Importantly, Puente Yopal demonstrates non-stationary behavior with a positive trend (as identified in Table 1), a trend which was reasonably captured by both of the nonlinear principal components obtained using this method; generally, this further illustrates the capability of NLPCA in handling nonlinear and non-stationary dynamic streamflow data.
For example, at some stations like Puente Carretera, Component 1 explains 85,74% of the variance, while at others like El Playón, Component 2 has a larger contribution because Component 1 explains 70,09%, and Component 2 explains the remaining variable percentage (29,91%). This distribution suggests more complex flow patterns at El Playón, with the need for more dimensions in order to fully explain variance. Eigenvector values per component show the proportion to original variables, which corresponds to water residence time and geographical factors specific to each station. More broadly, Component 1 captures high variance across stations, with support for the use of NLPCA in effective dimensionality reduction of datasets dominated by a small number of patterns.
This highlights the robustness of the method, particularly given the challenging morphological, climatological, and anthropogenic drivers existing in the Cravo Alto sub-basin, where a correlation coefficient of 0.99, a Pbias of 0.0, and an MSE of 1.34, were attained at the Puente Yopal station. The Cravo Alto sub-basin showed extensive landscape changes and significant positive Sen `s slope and non-stationarity by the Phillips-Perron test, suggesting important streamflow modifications in the last decades. Such transformations lead to the non-stationarity of streamflow data, meaning we might need to attribute observed hydrological changes to natural variability or human-induced changes. Though these complexities included a 23% rate of missing data and a coefficient of variation of 49%, the proposed method showed good performance in terms of possible univariate of missing treatment data for trend change detection and non-stationary behaviour in the sub-basin. These findings highlight the approach's flexibility with respect to natural variability and human-induced change, even in difficult hydrological environments.
DISCUSSION
The performance metrics results from this research are competent and, in many cases, superior to missing flow data completion methods that other researchers in various global basins have used. For example, Meher (2019) reported that CCs between 0.15 and 0.95 in flow missing data completion for the Mahanadi River basin in India use artificial intelligence. On the other hand, Elshorbagy et al. (2002) demonstrated better performance for missing data completion with Artificial Neural Network (ANN) based models over the K nearest neighbour (Knn) method. The authors report an MSE of 80.2 m3/s (255.5 m3/s) for the ANN (Knn) method for the English River in Canada. Zhang et al. (2022) estimated an MSE of 5.4 m3/s and a mean percentage error of 44% on the Yunnan-Guizhou Plateau (China) river flow estimate.
Previous research in Colombia has used other performance metrics to evaluate the robustness of the missing data completion process. For example, Canchala-Nastar et al. (2020) estimated flow using the NLPCA method without exogenous variables. They determined a Relative Mean Squared Error of 0.03 and 0.39 for the Atrato and Patía rivers in the Colombian Pacific.
Flow data imputation methods can be classified into two main groups. The first group is based on hydrological and rainfall-runoff models, and the second is based on statistical methods, numerical models, and artificial intelligence. In the first group of methods for completing missing flow data, good results are presented and enhanced with advanced remote sensing. For example, Ergün & Demirel (2023) used this methodology to complete flow data in the Moselle (Europe) and Konya (Turkey) basins. This method made it possible to reconstruct flow data for up to 365 consecutive days without information.
On the other hand, Lou et al. (2022) used remote sensing to measure river widths based on Landsat-8 imagery to estimate the flow of large rivers in the Yunnan-Guizhou Plateau (China). On the other hand, statistical techniques and numerical models are supported by artificial intelligence to estimate missing flow data. Among these, Tencaliec et al. (2015) used autoregressive integrated moving average (ARIMA) models to estimate flow in the Durance basin (France). Oyerinde et al. (2021) used multivariate imputation in chained equations to estimate flow in the Niger River basin. Kim et al. (2015) used a soil and water assessment tool (SWAT) to estimate flow in the Taewa River (South Korea), showing good results for estimating low flows.
On the other hand, studies using artificial intelligence to complete flow data have shown results that stand out from the different methods. For example, ANN and Kohonen’s self-organizing maps are used to complete flow data in the Taewa River (South Korea) (Kim et al., 2015). Similarly, the MissForest method was used in the basins of central Chile (Arriagada et al., 2021). The use of ANN in the English River basin in Canada (Elshorbagy et al., 2002) was used to complete daily flow data in the Langat River basin (Malaysia) (Mispan et al., 2015).
The robust performance in estimating missing flow data is attributed to two main reasons. The first is the advantage of nonlinear and artificial intelligence-based methods, as mentioned above and as demonstrated by Hamzah et al. (2020) in the recently conducted state-of-the-art review. Secondly, the potentiation of the method by using precipitation as an exogenous variable. Arriagada et al. (2021) demonstrated that the completion of missing flow data using artificial intelligence improved when the number of predictor records and the duration of the record were increased with the MissForest method achieving satisfactory results in 20 or more flow records with 15 or more years of duration in 10 basins in central Chile.
The main modes of variability of the hydroclimatological information of the area were determined by the monthly precipitation information from CHIRPS. These time series organized at the sub-basin level were sufficient to obtain the principal components that explain the variability of the territory. This differs from the study reported by Di Vittorio & Georgakakos (2021), who used the CHIRPS database to model runoff from regions surrounding the Sudd wetland in the Nile River basin (Africa). The main difference is that the study of Di Vittorio & Georgakakos (2021) used CHIRPS information to model the spatial variability of runoff, while the current research used precipitation information as a source of information to explain the modes of variability of the rainfall-runoff process supported by NLPCA.
The selection of the imputation method is related to the information available to complete the missing data but also depends on the percentage of missing data of the time series to be completed. Bleidorn et al. (2022) conducted a study of flow missing data completion assuming the following percentages of missing data: 5%, 10%, 15%, 25% and 40% using ten data imputation methodologies. The results indicate that for 5% missing data, any imputation methodology can be considered; however, as the proportion of missing data increases, using multiple imputation and maximum likelihood methodologies is recommended when there are supporting stations for treatment data. On the other hand, the study of Tencaliec et al. (2015) in the Durance basin (France) suggests that the choice of the model depends mainly on the characteristics and hydrological regimes of the station, and a generalization cannot be made for all stations to obtain optimal results.
An additional advantage of the data completion method proposed in this research is its versatility in completing the information adequately in stations with different percentages of missing data. In this case study, the percentage of missing data varies between 5% and 63%. In addition, there are continuous periods of missing data of up to 15 consecutive years without information. The versatility and robustness of the proposed method overcame these particularities of the flow database.
The proposed method, rooted in an understanding of the hydrological behavior of the Cravo Sur basin, leverages the relationship between precipitation and streamflow to reconstruct missing streamflow records. This relationship is governed by processes such as infiltration, surface runoff, and evapotranspiration, which collectively transform rainfall into river discharge. The model effectively captures the complex rainfall-streamflow dynamics across the basin by incorporating precipitation as an exogenous variable. This is evident in the nearly identical cumulative curves between observed and imputed values and the low residuals, indicating minimal estimation error. These results highlight the model’s suitability for data-limited regions and demonstrate its capacity to accurately extend and fill streamflow records, thereby supporting broader hydrological applications.
The variability of rainfall spatially and temporally, as observed with in situ data of rain gauges and satellite data of CHIRPS, is relevant to understanding the hydrological response of the catchment and the sub-basins of the Cravo Sur basin. In particular, the distribution of precipitation directly drives runoff timing and volume in sub-basins and, therefore, increases streamflow variability. This extensive rainfall input helps the imputation method improve streamflow modelling and obtain higher accuracy in missing data estimation. Interestingly, sub-basins like Cravo Alto and Cravo Sur present the strongest imputation accuracy due to their similar hydrological behavior with high flow magnitudes and low coefficients of variation that favor stable runoff.
The method, unlike other approaches based on absolute precipitation values, uses precipitation anomalies and can thus parameterize different hydrological conditions within each sub-basin. This novel data-driven, dimensionless nonlinear principal components-based anomaly-centered approach decreases reliance on absolute values and generates repeatable imputation outcomes across a diverse spectrum of hydrological environments. This NLPCA-based missing data imputation has several innovative aspects, which are inherent because traditional statistical models have poorly represented nonlinear relationships within hydrology data in the region. The method incorporates CHIRPS data and nonlinear interactions between precipitation and streamflow more precisely, thus addressing difficulties with limited monitoring stations and the complex geomorphology of the Cravo Sur basin.
The method’s flexibility is particularly valuable as climate change introduces shifts in baseline hydrological patterns. In Cravo Alto, for instance, the observed positive streamflow trend aligns with broader global trends of increased hydrological variability. The NLPCA method’s adaptability to non-stationary behaviors and climate-driven anomalies highlights its potential for climate-resilient water resource management. Future research could expand on these findings by exploring the integration of long-term climate projections to enhance the model's robustness in regions experiencing intensified climate impacts.
For optimal application, the precipitation time series should match or exceed the flow time series in length, with both series aligned temporally. The efficacy of this approach increases with the availability of additional precipitation records, which likely accounts for the lower performance observed at Puente Carretera, where limited precipitation data from only five CHIRPS centroids in the La Niata sub-basin may have constrained model accuracy.
CONCLUSIONS
The results depicted by the graph and correlation highlight the benefits of the proposed imputation approach. Such an approach allowed for the reconstruction of monthly flow data at the omitted stations, which had a high percentage of missing data and significant variation. As an example, the Puente Carretera station, with an average flow rate of 8.01m3/s, standard deviation of 6.9 m3/s, and 63% of data missing, reached a reasonable imputation estimate of CC = 0.95; Pbias = −2.1% and MSE = 1.93m3/s, which is smaller than the standard deviation providing robust imputation accuracy.
The method we present here, utilizing flow data from IDEAM, CHIRPS data, and the NLPCA tool, has the potential to impact the field significantly. It can effectively reconstruct flow data from poorly instrumented basins or those with low-quality hydrometric information, making it a promising tool for the Colombian Orinoquía region.
Flow data imputation methods can be broadly categorized into two families: those based on hydrological or analogous models and those based on black-box numerical models. They have both been used around the world, and both have their benefits and drawbacks. Black-box methods, with low data requirements and realistic approximations of flow, are the method of choice for most basins, fewer of which are as well monitored as those of the Colombian Orinoquía. Such methods are ideal to be applied in highly variable regions with complex hydrological cycles and diverse water resources as the Cravo Sur River basin.
This imputation method could be applied to various types of hydrological time series in future research and contribute to improving the method of streamflow extension. A physical model relating rainfall to streamflow generation would also help; this would require a model that includes the soil moisture dynamic, catchment characteristics (e.g. topography, land use) and lag times (delay between rainfall and streamflow response). A sensitivity analysis on slope, vegetation cover, and soil type could also provide insight into the physical environment that maximizes or constricts performance. Coupling auxiliary data such as soil moisture, land-use changes, and groundwater levels could also enhance the potential of non-linear approaches in complex hydrological settings. Additionally, hybrid approaches combining black-box with standard hydrological models deserve further exploration. They may provide robust solutions that exploit the merits of both empirical predictions and inherent basin-specific physical characteristics.
Evaluating the robustness and versatility of this imputation method to basins with different hydrological and climatic characteristics and under different data availability scenarios would be a good step forward.
ACKNOWLEDGEMENTS
The authors would like to thank to the Fundación Universitaria de San Gil, the Research Group on Water Resources and Soil Engineering - IREHISA of the Universidad del Valle and the Environmental Engineering Group - GIA of the Universidad Mariana for their contributions to this research work. Finally, thanks to IDEAM, the United States Geological Survey USGS, and the University of California, Santa Barbara (UCSB) for providing the database containing flow data for the Colombian Orinoquía and precipitation data from CHIRPS. The research was funded by the Ministry of Science, Technology and Innovation (MinCiencias) of Colombia for financing the project “Fortalecimiento de los sistemas de información de la calidad y valoración de los servicios ambientales del agua para contribuir al desarrollo sostenible del sector agroindustrial del departamento de Casanare. Yopal Nunchía” (BPIN Code: 2020000100435) financed with resources from the Science, Technology and Innovation - CTeI fund of the General System of Royalties - SGR.
REFERENCES
-
Arriagada, P., Karelovic, B., & Link, O. (2021). Automatic gap-filling of daily streamflow time series in data-scarce regions using a machine learning algorithm. Journal of Hydrology, 598, 126454. http://doi.org/10.1016/j.jhydrol.2021.126454
» http://doi.org/10.1016/j.jhydrol.2021.126454 -
Bleidorn, M. T., Pinto, W. P., Schmidt, I. M., Mendonça, A. S. F., & Reis, J. A. T. (2022). Methodological approaches for imputing missing data into monthly flows series. Revista Ambiente & Água, 17(2), 1-27. http://doi.org/10.4136/ambi-agua.2795
» http://doi.org/10.4136/ambi-agua.2795 -
Builes‐Jaramillo, A., Yepes, J., & Salas, H. D. (2022). The Orinoco low‐level jet during El Niño–Southern oscillation. International Journal of Climatology, 42(15), 7863-7877. http://doi.org/10.1002/joc.7681
» http://doi.org/10.1002/joc.7681 -
Canchala-Nastar, T., Carvajal-Escobar, Y., Alfonso-Morales, W., Loaiza Cerón, W., & Caicedo, E. (2019). Estimation of missing data of monthly rainfall in southwestern Colombia using artificial neural networks. Data in Brief, 26, 104517. PMid:31667280. http://doi.org/10.1016/j.dib.2019.104517
» http://doi.org/10.1016/j.dib.2019.104517 -
Canchala-Nastar, T., Loaiza Cerón, W., Francés, F., Carvajal-Escobar, Y., Andreoli, R., Kayano, M., Alfonso-Morales, W., Caicedo-Bravo, E., & Ferreira de Souza, R. (2020). Streamflow variability in colombian pacific basins and their teleconnections with climate indices. Water, 12(2), 526. http://doi.org/10.3390/w12020526
» http://doi.org/10.3390/w12020526 -
Castillo-Gómez, J. S. D., Canchala, T., Torres-López, W. A., Carvajal-Escobar, Y., & Ocampo-Marulanda, C. (2023). Estimation of monthly rainfall missing data in Southwestern Colombia: comparing different methods. Revista Brasileira de Recursos Hídricos, 28, e9. http://doi.org/10.1590/2318-0331.282320230008
» http://doi.org/10.1590/2318-0331.282320230008 -
Castro-Heredia, L. M. C., Carvajal-Escobar, Y., & Ávila-Díaz, J. Á. (2012). Cluster analysis as a technique for exploratory analysis of multiple records in meteorological data. Ingeniería de Recursos Naturales y del Ambiente, (11), 11-20. Retrieved in 2024, July 26, from https://www.redalyc.org/pdf/2311/231125817001.pdf
» https://www.redalyc.org/pdf/2311/231125817001.pdf -
Cerón, W. L., Molina-Carpio, J., Ayes Rivera, I., Andreoli, R. V., Kayano, M. T., & Canchala, T. (2020). A principal component analysis approach to assess CHIRPS precipitation dataset for the study of climate variability of the La Plata Basin, Southern South America. Natural Hazards, 103(1), 767-783. http://doi.org/10.1007/s11069-020-04011-x
» http://doi.org/10.1007/s11069-020-04011-x - Colombia. Corporación Autónoma Regional de la Orinoquia – Corporinoquia. (2010). Resolución nº. 200.41-10.1402. “Por medio de la cual se regula el uso y aprovechamiento del recurso hídrico en el río Tocaría”. Diario Oficial, Bogotá, D. C. Retrieved in 2024, July 26, from www.corporinoquia.gov.co
- Colombia. Corporación Autónoma Regional de la Orinoquia – Corporinoquia. Corporación Autónoma Regional de Boyacá – Corpoboyaca. (2015a). Evaluación del componente hidrográfico. In Corporación Autónoma Regional de la Orinoquia (Ed.), Plan de Ordenación y Manejo de la Cuenca del Río Cravo Sur – POMCA (Cap. 2, pp. 1-25). Casanare, Colombia. Retrieved in 2024, July 26, from www.corporinoquia.gov.co
- Colombia. Corporación Autónoma Regional de la Orinoquia – Corporinoquia. Corporación Autónoma Regional de Boyacá – Corpoboyaca. (2015b). Evaluación de la zonificación de las amenazas en la cuenca del Río Cravo Sur. In Corporación Autónoma Regional de la Orinoquia (Ed.), Plan de Ordenación y Manejo de la Cuenca del Río Cravo Sur – POMCA (Cap. 6, pp. 1-20). Casanare, Colombia. Retrieved in 2024, July 26, from www.corporinoquia.gov.co
-
Colombia. Corporación Autónoma Regional de la Orinoquia – Corporinoquia. Corporación Autónoma Regional de Boyacá – Corpoboyaca. (2018). Actualización del plan de ordenación y manejo de la Cuenca del Río Cravo sur código 3521. Yopal, Colombia. Retrieved in 2024, July 26, from https://www.corpoboyaca.gov.co/cms/wp-content/uploads/2018/06/resumen-ejecutivo-rio-cravo-sur.pdf
» https://www.corpoboyaca.gov.co/cms/wp-content/uploads/2018/06/resumen-ejecutivo-rio-cravo-sur.pdf -
Colombia. Ministerio de Ambiente y Desarrollo Sostenible – MADS. (2023). Plataforma colaborativa 8 río Cravo Sur. Bogotá, D. C. Retrieved in 2024, July 26, from https://www.minambiente.gov.co/gestion-integral-del-recurso-hidrico/plataformas-colaborativas/plataforma-colaborativa-8-rio-cravo-sur/
» https://www.minambiente.gov.co/gestion-integral-del-recurso-hidrico/plataformas-colaborativas/plataforma-colaborativa-8-rio-cravo-sur/ -
Di Vittorio, C. A., & Georgakakos, A. P. (2021). Hydrologic modeling of the sudd wetland using satellite-based data. Journal of Hydrology. Regional Studies, 37, 100922. http://doi.org/10.1016/j.ejrh.2021.100922
» http://doi.org/10.1016/j.ejrh.2021.100922 -
Duarte, L. V., Formiga, K. T. M., & Costa, V. A. F. (2022). Comparison of methods for filling daily and monthly rainfall missing data: statistical models or imputation of satellite retrievals? Water, 14(19), 3144. http://doi.org/10.3390/w14193144
» http://doi.org/10.3390/w14193144 -
Elshorbagy, A., Simonovic, S. P., & Panu, U. S. (2002). Estimation of missing streamflow data using principles of chaos theory. Journal of Hydrology, 255(1-4), 123-133. http://doi.org/10.1016/S0022-1694(01)00513-3
» http://doi.org/10.1016/S0022-1694(01)00513-3 -
Ergün, E., & Demirel, M. C. (2023). On the use of distributed hydrologic model for filling large gaps at different parts of the streamflow data. Engineering Science and Technology, an International Journal, 37, 101321. http://doi.org/10.1016/j.jestch.2022.101321
» http://doi.org/10.1016/j.jestch.2022.101321 -
Funk, C. C., Peterson, P. J., Landsfeld, M. F., Pedreros, D. H., Verdin, J. P., Rowland, J. D., Romero, B. E., Husak, G. J., Michaelsen, J. C., & Verdin, A. P. (2014). A quasi-global precipitation time series for drought monitoring. U.S. Geological Survey Data Series, 832(4), 1-12. http://doi.org/10.3133/ds832
» http://doi.org/10.3133/ds832 -
Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A., & Michaelsen, J. (2015). The climate hazards infrared precipitation with stations-a new environmental record for monitoring extremes. Scientific Data, 2(1), 150066. PMid:26646728. http://doi.org/10.1038/sdata.2015.66
» http://doi.org/10.1038/sdata.2015.66 -
Hamzah, F. B., Hamzah, F. M., Razali, S. M., & Samad, H. (2021). A comparison of multiple imputation methods for recovering missing data in hydrological studies. Civil Engineering Journal, 7(9), 1608-1619. http://doi.org/10.28991/cej-2021-03091747
» http://doi.org/10.28991/cej-2021-03091747 -
Hamzah, F. B., Mohd Hamzah, F., Mohd Razali, S. F., Jaafar, O., & Abdul Jamil, N. (2020). Imputation methods for recovering streamflow observation: a methodological review. Cogent Environmental Science, 6(1), 1745133. http://doi.org/10.1080/23311843.2020.1745133
» http://doi.org/10.1080/23311843.2020.1745133 -
Instituto de Hidrologia, Meteorologia e Estudos Ambientais – IDEAM. (2013). Zonificación y codificación de las unidades hidrográficas e hidrogeológicas de Colombia. Bogotá, D.C.: IDEAM. Retrieved in 2024, July 26, from https://www.minambiente.gov.co/wp-content/uploads/2021/10/MEMORIAS-MAPA-ZONIFICACION-HIDROGRAFICA.pdf
» https://www.minambiente.gov.co/wp-content/uploads/2021/10/MEMORIAS-MAPA-ZONIFICACION-HIDROGRAFICA.pdf -
Instituto Geográfico Agustín Codazzi – IGAC. (2020).Atlas de suelos y cartografía del departamento del Casanare. Bogotá, D.C. Retrieved in 2024, July 26, from https://www.igac.gov.co/
» https://www.igac.gov.co/ -
Instituto Geográfico Agustín Codazzi – IGAC. (2021).Caracterización del uso del suelo y el paisaje en los Llanos Orientales. Bogotá, D.C. Retrieved in 2024, July 26, from https://www.igac.gov.co/http://documentacion.ideam.gov.co/openbiblio/bvirtual/022655/MEMORIASMAPAZONIFICACIONHIDROGRAFICA.pdf
» https://www.igac.gov.co/http://documentacion.ideam.gov.co/openbiblio/bvirtual/022655/MEMORIASMAPAZONIFICACIONHIDROGRAFICA.pdf -
Jiang, Y., Bao, X., Hao, S., Zhao, H., Li, X., & Wu, X. (2020). Monthly streamflow forecasting using ELM-IPSO based on phase space reconstruction. Water Resources Management, 34(11), 3515-3531. http://doi.org/10.1007/s11269-020-02631-3
» http://doi.org/10.1007/s11269-020-02631-3 -
Kim, M., Baek, S., Ligaray, M., Pyo, J., Park, M., & Cho, K. (2015). Comparative studies of different imputation methods for recovering streamflow observation. Water, 7(12), 6847-6860. http://doi.org/10.3390/w7126663
» http://doi.org/10.3390/w7126663 -
Lou, H., Zhang, Y., Yang, S., Wang, X., Pan, Z., & Luo, Y. (2022). A new method for long-term river discharge estimation of small- and medium-scale rivers by using multisource remote sensing and RSHS: application and validation. Remote Sensing, 14(8), 1798. http://doi.org/10.3390/rs14081798
» http://doi.org/10.3390/rs14081798 -
Ly, S., Charles, C., & Degré, A. (2013). Different methods for spatial interpolation of rainfall data for operational hydrology and hydrological modeling at watershed scale: a review. Biotechnologie, Agronomie, Société et Environnement, 17(2), 392-406. Retrieved in 2024, July 26, from https://orbi.uliege.be/bitstream/2268/136084/1/%e2%80%a2%20Degr%c3%a9.pdf
» https://orbi.uliege.be/bitstream/2268/136084/1/%e2%80%a2%20Degr%c3%a9.pdf -
Meher, J. (2019). Missing discharge data filling with artificial neural network. Journal on Civil Engineering, 9(2), 24-31. http://doi.org/10.26634/jce.9.2.14657
» http://doi.org/10.26634/jce.9.2.14657 -
Miró, J. J., Caselles, V., & Estrela, M. J. (2017). Multiple imputation of rainfall missing data in the Iberian Mediterranean context. Atmospheric Research, 197, 313-330. http://doi.org/10.1016/j.atmosres.2017.07.016
» http://doi.org/10.1016/j.atmosres.2017.07.016 -
Mispan, M. R., Rahman, N. F. F. A., Ali, M. F., Khalid, K., Bakar, M. H. A., & Haron, S. H. (2015). Missing river discharge data imputation approach using artificial neural network. Methodology, 25, 20. Retrieved in 2024, July 26, from http://www.arpnjournals.org/jeas/research_papers/rp_2015/jeas_1215_3088.pdf
» http://www.arpnjournals.org/jeas/research_papers/rp_2015/jeas_1215_3088.pdf -
Ocampo-Marulanda, C., Cerón, W. L., Avila-Diaz, A., Canchala, T., Alfonso-Morales, W., Kayano, M. T., & Torres, R. R. (2021). Missing data estimation in extreme rainfall indices for the Metropolitan area of Cali - Colombia: an approach based on artificial neural networks. Data in Brief, 39, 107592. PMid:34869806. http://doi.org/10.1016/j.dib.2021.107592
» http://doi.org/10.1016/j.dib.2021.107592 -
Ocampo-Marulanda, C., Fernández-Álvarez, C., Cerón, W. L., Canchala, T., Carvajal-Escobar, Y., & Alfonso-Morales, W. (2022). A spatiotemporal assessment of the high-resolution CHIRPS rainfall dataset in southwestern Colombia using combined principal component analysis. Ain Shams Engineering Journal, 13(5), 101739. http://doi.org/10.1016/j.asej.2022.101739
» http://doi.org/10.1016/j.asej.2022.101739 -
Oyerinde, G. T., Lawin, A. E., & Adeyeri, O. E. (2021). Multivariate infilling of missing daily discharge data on the Niger basin. Water Practice and Technology, 16(3), 961-979. http://doi.org/10.2166/wpt.2021.048
» http://doi.org/10.2166/wpt.2021.048 -
Paredes-Trejo, F., Olivares, B. O., Movil-Fuentes, Y., Arevalo-Groening, J., & Gil, A. (2023). Assessing the spatiotemporal patterns and impacts of droughts in the Orinoco River basin using earth observations data and surface observations. Hydrology, 10(10), 195. http://doi.org/10.3390/hydrology10100195
» http://doi.org/10.3390/hydrology10100195 -
Ruíz-Ochoa, M. A., Vargas-Corredor, Y. A., Orduz-Amaya, L. P., & Torres-Corredor, J. S. (2022). Climate variability in water planning in the Cravo Sur river basin (Casanare, Colombia). Información Tecnológica, 33(4), 117-124. http://doi.org/10.4067/S0718-07642022000400117
» http://doi.org/10.4067/S0718-07642022000400117 -
Scholz, M., Fraunholz, M., & Selbig, J. (2008). Nonlinear principal component analysis: neural network models and applications. In A. N. Gorban, B. Kégl, D. C. Wunsch, & A. Y. Zinovyev (Eds.), Principal manifolds for data visualization and dimension reduction (Lecture Notes in Computational Science and Enginee, Vol. 58). Berlin: Springer. http://doi.org/10.1007/978-3-540-73750-6_2
» http://doi.org/10.1007/978-3-540-73750-6_2 -
Scholz, M., Kaplan, F., Guy, C. L., Kopka, J., & Selbig, J. (2005). Non-linear PCA: a missing data approach. Bioinformatics, 21(20), 3887-3895. PMid:16109748. http://doi.org/10.1093/bioinformatics/bti634
» http://doi.org/10.1093/bioinformatics/bti634 -
Tencaliec, P., Favre, A.-C., Prieur, C., & Mathevet, T. (2015). Reconstruction of missing daily streamflow data using dynamic regression models. Water Resources Research, 51(12), 9447-9463. http://doi.org/10.1002/2015WR017399
» http://doi.org/10.1002/2015WR017399 -
Umar, N., & Gray, A. (2023). Comparing single and multiple imputation approaches for missing values in univariate and multivariate water level data. Water, 15(8), 1519. http://doi.org/10.3390/w15081519
» http://doi.org/10.3390/w15081519 -
Urrea, V., Ochoa, A., & Mesa, O. (2016). Validation of the CHIRPS precipitation database for Colombia at daily, monthly, and annual scales for the period 1981-2014. In XXVII Congreso Latinoamericano de Hidráulica, Lima, Peru. Retrieved in 2024, July 26, from https://www.researchgate.net/profile/Andres-Ochoa-7/publication/310844678/links/583a18ab08ae3d91723f65b7/Validacion-de-la-base-de-datos-de-precipitacion-CHIRPS-para-Colombia-a-escala-diaria-mensual-y-anual-en-el-periodo-1981-2014.pdf
» https://www.researchgate.net/profile/Andres-Ochoa-7/publication/310844678/links/583a18ab08ae3d91723f65b7/Validacion-de-la-base-de-datos-de-precipitacion-CHIRPS-para-Colombia-a-escala-diaria-mensual-y-anual-en-el-periodo-1981-2014.pdf -
van Oel, P. R., Krol, M. S., & Hoekstra, A. Y. (2011). Downstreamness: a concept to analyze basin closure. Journal of Water Resources Planning and Management, 137(5), 404-411. http://doi.org/10.1061/(ASCE)WR.1943-5452.0000127
» http://doi.org/10.1061/(ASCE)WR.1943-5452.0000127 -
Zhang, Y., Luo, J., Peng, J., Zhang, H., Ji, Y., & Wang, H. (2022). A new method for long-term river discharge estimation of small-and medium-scale rivers by using multisource remote sensing and rSHS: application and validation. Remote Sensing, 14(8), 1798. http://doi.org/10.3390/rs14081798
» http://doi.org/10.3390/rs14081798
Edited by
-
Editor-in-Chief:
Adilson Pinheiro
-
Associated Editor:
Carlos Henrique Ribeiro Lima
Publication Dates
-
Publication in this collection
19 May 2025 -
Date of issue
2025
History
-
Received
26 July 2024 -
Reviewed
11 Jan 2025 -
Accepted
19 Feb 2025






























