On the discretization of river networks for large scale hydrologic-hydrodynamic models

The discretization of river networks is a critical step for computing flow routing in hydrological models. However, when it comes to more complex hydrologic-hydrodynamic models, adaptations in the spatial representation of model calculation units are further required to allow cost-effective simulations, especially for large scale applications. The objective of this paper is to assess the impacts of river discretization on simulated discharge, water levels and numerical stability of a catchment-based hydrologic-hydrodynamic model, using a fixed river length (Δx) segmentation method. The case study was the Purus river basin, a sub-basin of the Amazon, which covers an area that accounts for rapid response upstream reaches to downstream floodplain rivers. Results indicate that the maximum and minimum discharges are less affected by the adopted Δx (reach-length), whereas water levels are more influenced by this selection. It is showed that for the explicit local inertial one-dimensional routing, Δx and the α parameter of CFL (Courant-Friedrichs-Lewy) condition must be carefully chosen to avoid mass balance errors. Additionally, a simple Froude number-based flow limiter to avoid numerical issues is proposed and tested.


INTRODUCTION
Hydrological models are a set of mathematical equations designed to represent components of the hydrological cycle, allowing for the understanding of its processes and many other applications, for instance, the assessment of land use and climate change impacts Bravo et al., 2014;Nóbrega et al., 2011) or operational flood and drought forecasting (Alfieri et al., 2013;Sheffield et al., 2014;Fan et al., 2015). These models usually account for a water budget (rainfall storage and runoff) module and a water flow along river networks (i.e. routing) module.
At larger scales, rivers typically cross several hundreds of kilometers with different climatic and geophysical conditions, encompassing well-defined channels in mountainous regions to extensive floodable areas at lowlands. While simplified physical approaches are known to be suitable for flow routing in steep terrains (Price, 2018), and yet adopted by most of current global hydrological models (Bierkens, 2015;Kauffeldt et al., 2016), complexities emerging from river-floodplain water exchange in mild slopes can lead to flood peak delay and backwater effects on tributaries, which cannot be resolved with simple flow routing methods (Trigg et al., 2009;Yamazaki et al., 2011;Paiva et al., 2011;Fleischmann et al., 2016;Lopes et al., 2018;Zhao et al., 2017). In contrast, hydraulic models are able to represent floodplain and backwater processes at some computational cost and with large data requirements (Chaudhari et al., 2019). Modeling large scale river hydrodynamics can be very challenging, since many of the world largest floodplains occur in ungauged or poorly monitored areas, such as the Amazon, Congo, Niger and Paraguay rivers Pedinotti et al., 2012;Paiva et al., 2013;Tshimanga & Hughes, 2014).
Over the last few decades, significant progress in remote sensing technologies has introduced new perspectives for hydrodynamic modeling studies (Schumann et al., 2009;Yan et al., 2015;Sampson et al., 2016;Sheffield et al., 2018;Bates et al., 2018). The near-global coverage of Shuttle Radar Topography Mission -SRTM (Farr et al., 2007) led to the development of global, voidfilled Digital Elevation Models (DEM) (e.g., Jarvis et al., 2008) and derivatives such as the one developed by the HydroSHEDS project (Lehner et al., 2006), which have the advantage of being open access for the scientific community. Together with the huge amount of information brought by remotely sensed data, increasing advances in computational resources have encouraged the application of large-scale hydrodynamic models in many regions of the globe. In this context, the offline coupling between a hydrological and a fully 2D or 1D/2D channel/floodplain hydraulic model has proven a very promising approach to simulate floodplain dynamics in complex river systems (Biancamaria et al., 2009;Paz et al., 2011Paz et al., , 2014Schumann et al., 2013;Hoch et al., 2017;Munar et al., 2018). However, 2D flood inundation models are still computationally expensive depending on the modeling objectives and are strongly sensitive to DEM accuracy. The latter is important because high quality terrain datasets (e.g., LiDAR or TanDEM-X) with global coverage are not freely available to the general public (Yan et al., 2015;Sampson et al., 2016;Archer et al., 2018). This particular problem can be even more pronounced in densely vegetated areas such as the Amazon Basin, since the C-band radar of the widely used SRTM is not able to fully penetrate the vegetation canopy (Carabajal & Harding, 2005;Berry et al., 2007;O'Loughlin et al., 2016;Yamazaki et al., 2017).
A simpler approach used by some large-scale hydrodynamic studies is to adopt one-dimensional flow routing treating the floodplains as storage units (Yamazaki et al., 2011;Paiva et al., 2011). Notwithstanding the limitations of the 1D model structure, especially its inability to represent water movement in multiple directions, several researches have demonstrated its potential at regional to global scales with satisfactory results (Yamazaki et al., 2014;Ikeuchi et al., 2015;Mateo et al., 2017;Pontes et al., 2017;Zhao et al., 2017;Lopes et al., 2018;Siqueira et al., 2018). Moreover, a key improvement of recent studies, albeit initially proposed for 2D flood inundation modeling, is the use of the explicit local inertia approximation of the shallow water equations (Bates et al., 2010), which has been an interesting alternative to the full Saint-Venant hydrodynamic equations due to its relative efficiency and easier implementation. The ability to speed up model computations together with code parallelization (Yamazaki et al., 2013) can be crucial either for handling finer model resolutions or when several model runs are made necessary, for instance, for uncertainty assessment in ensemble flood forecasting (e.g., Pappenberger et al., 2005), data assimilation (e.g., Brêda et al., 2017) or parameter estimation (e.g., Dung et al., 2011).
Another key issue for large-scale river routing is to define how the river network is depicted within the model. Partitioning a basin in a regular latitude-longitude cell grid, thus providing a gridded-based network river map, is a very common practice since coupling groundwater, atmospheric and land surface models (LSM) can be easily done with a simple cell-by-cell relationship (e.g., Decharme et al., 2008). Conversely, large errors in length and slope of rivers, as well as the definition of flow directions over coarse resolution maps, must be treated using upscaling techniques (e.g., Paz et al., 2006;Paz & Collischonn, 2007;Yamazaki et al., 2009;Wu et al., 2011), while a realistic representation of channels and floodplains can be only achieved with a very fine grid cell resolution (Goteti et al., 2008;Lehner & Grill, 2013).
A basin can also be divided in several unit-catchments according to its underlying DEM (e.g., Beighley et al., 2009;David et al., 2011;Paiva et al., 2011Paiva et al., , 2013Luo et al., 2017;Siqueira et al., 2018) to preserve small-scale topographic features. This approach seems to be more suitable for hydrological models since topography strongly controls surface water storage and movement (Goteti et al., 2008), and a better connectivity of the river system can be ensured by the generation of high-resolution, vector-based network river maps (David et al., 2013;Paiva et al., 2011;Yamazaki et al., 2013;Lin et al., 2018).
Discussions regarding the choice of grid or vector-based discretizations are present in recent literature. There is a strong motivation towards the use of the latter approach especially for large-scale modeling, since: (i) it is possible to reduce computational demand due to the more flexible computational elements (Beighley et al., 2009;Lehner & Grill, 2013;Yamazaki et al., 2013); (ii) there is a smaller sensitivity of model parameters to its spatial resolution, which can improve model scalability because it respects topography properties (Tesfa et al., 2014); (iii) gaugeto-reach association becomes easier with a high resolution river network, giving rise to higher societal meaning (David et al., 3/19 2013), when water users can easily identify known features in the simulation, such as a known river segment in the hydrographic map; and (iv) it is more suitable for hydro-ecological applications that typically require river-reach scale resolution (Lehner & Grill, 2013). Because of these benefits, some river routing models have been implementing vector-based river maps for computations over continental domains, such as RAPID (Tavakoly et al., 2016) and mizuRoute (Mizukami et al., 2016), as well as for global scale simulations, as in HydroROUT (Lehner & Grill, 2013). More hydrologically consistent DEMs are also becoming popular for hydrological-hydrodynamic models applications in recent works (Yamazaki et al., 2019).
These advances in regional to global hydrologic and 1D hydrodynamic modeling studies come with a need of better understanding how choices in parameters and model structures affect the simulation results. Although the impact of cross-section geometry and channel roughness in variables such as flood extent and river water levels has already been addressed by some previous studies (e.g. Yamazaki et al., 2011;Paiva et al., 2013;Saleh et al., 2013;Luo et al., 2017;Pontes et al., 2017), there is a lack of literature when it comes to river network discretization in the context of modeling at large scales. In a recent work, Mateo et al. (2017) assessed the effect of varying spatial resolution and flow connectivity using a global river model (CaMa-Flood) in the Chao Phraya River basin (158,000 km 2 ). It was found that model predictions in very flat floodplains and deltas can benefit from finer spatial resolutions if multiple downstream connectivity is accounted for in model structure, while using single downstream connectivity not necessarily implies in better results when resolution is improved. On the other hand, to the authors knowledge, large scale evaluations regarding 1D model stability conditions (i.e., mass balance numerical issues) were not previously addressed, which is of importance as 1D hydrodynamic studies encompassing entire basins have been adopting criteria from 2D modeling that are known to be suitable for flat regions (e.g. Yamazaki et al., 2013;Pontes et al., 2017).
In the present work we explore the knowledge gap regarding the river network discretization for large scale hydrologichydrodynamic models, focusing on vector-based representation of drainage networks. Our driving questions are: what is the effect of the spatial discretization on the distribution of unit-catchments, and on simulated river discharge and water levels? What is the effect of spatial discretization on modeling numerical aspects?

METHODOLOGY
In this study, we used a vector-based, spatial discretization method that assumes unit-catchments with fixed river lengths. This method is interesting because it allows a better control of the total computational time of the hydrodynamic model, as discussed in the following text.
For the hydrologic-hydrodynamic simulation we used the MGB model (Modelo de Grandes Bacias in portuguese; Pontes et al., 2017) to assess impacts of catchment / river discretization on simulations, using the Purus river basin (an Amazon tributary) as a case study. The following sections provide a detailed description of the discretization method, the model, the study area, the input data and the performed assessments.

A length-delimited, vector-based river discretization for hydrologic-hydrodynamic modeling
An initial step needed to run a hydrological or a river routing model is to derive flow directions from the terrain data. For large-scale applications the most widely used approach is the deterministic eight-node (D8) method, where the direction that the water flows in a given pixel is assigned as a single direction towards the steepest slope among its eight surrounding neighbors (Jenson & Domingue, 1988). Although other approaches were proposed to account for multiple directions and better represent the water movement over terrain (Tarboton, 1997;Seibert & McGlynn, 2007), they can create diffuse and overlapping catchment area boundaries (Jones, 2002).
A pit removal procedure is often applied prior to flow direction in order to produce a "hydrologically corrected" (or conditioned) DEM, since the existence of flat regions or single topographical depressions leads to areas without a defined outlet and disconnected drainage patterns (Martz & Garbrecht, 1999;Planchon & Darboux, 2001;Wang & Liu, 2006;Buarque et al., 2009). Currently, global datasets of flow directions with extensive manual corrections are available at different spatial resolutions (e.g., Lehner et al., 2008), which can be suitable for many large scale hydrological and river routing modeling applications.
The sum of pixels along flow direction paths leads to a flow accumulation matrix, which needs to be reclassified in order to define the main channels. However, depending on the model domain, it is recommended to propagate the upstream area along the flow directions instead of the upstream number of pixels, in order to properly consider the area of the pixels, which can vary at different latitudes if the pixels have the same resolution in angular units, when using geographical coordinates (Paz & Collischonn, 2007). Drainage networks are then extracted by means of reclassification of the flow accumulation matrix using a constant area threshold, corresponding to the smallest area at which channel hydrodynamics is represented. Further discussions regarding the choices of area thresholds for river networks definitions, and even suggestions for the use of variable thresholds, can be found in Fan et al. (2013).
One of the common procedures to derive unit-catchments is to use well-known GIS packages, as an example, the ArcHydro definitions (Paiva et al., 2011;Yamazaki et al., 2013). Similar procedures are also used in free softwares such as QGIS (e.g., the GRASS GIS plugin). In this method, all pixels that are draining to a given river reach belong to the same unit-catchment, while a reach is defined by a river segment between two junctions or between a river spring and its downstream junction. Although the method provides a simple and logical description for the model topology, it produces high variability of reach lengths that are not suitable to more complex hydraulic modeling procedures. Paiva et al. (2011) addressed this problem assuming unit-catchments as floodplain units and dividing reaches according to a predefined maximum length, when using a full 1D Saint-Venant hydrodynamic model. For handling short and long reaches in a global inertial routing modelling, Yamazaki et al. (2013) adopted outlet rejection and outlet adding procedures according to minimum and maximum length thresholds, the latter equal to double the minimum threshold value.
In the present study, we adopted a fixed-length vector river discretization in order to provide equal flow distances and to enhance model time-step and runtime control, as well as to facilitate the coupling between hydrological and hydrodynamic modules. This discretization method is the same used by Siqueira et al. (2018) for continental hydrologic-hydrodynamic modeling in South America. The discretization procedure adopted is described below.
Step 1 -Marking outlets Starting from a previously delineated river network from usual definitions (Buarque et al., 2009), the initial step is to mark all the intermediate outlets corresponding to the very downstream point of a given river reach (orange boxes in Figure 1a). For that, the algorithm first identifies the river junctions by checking if two or more neighboring pixels of the D8 flow direction grid are over the river network and also drain to the analyzed pixel. In a positive case, the grid positions (row, column) of nearby upstream draining pixels (i.e., the intermediate outlets) and their respective flow accumulated areas are both stored in a vector. This procedure is repeated for each junction until the entire grid is evaluated and all the intermediate outlets are found. In addition, the attributes of the pixel with the largest upstream area (basin outlet) are also included in the vector of intermediate outlets, as it is not connected to any junction.
Step 2 -Delineating reaches and unit-catchments by a length threshold The next step is to segment streams by using a predefined reach length (threshold) and a stepwise approach. The algorithm begins at the basin outlet and performs the segmentation in the upstream direction (Figure 1b), so that the vector of intermediate outlets is sorted descending in terms of drainage area. After setting an accumulated length value of zero to the basin outlet (green square in Figure 1b), the algorithm starts tracing in the upstream direction and the accumulated length is updated at each pixel using the Distance Transforms method (Butt & Maragos, 1998) to improve distance calculations (Paz & Collischonn, 2007). Whenever the length threshold is exceeded, as indicated by the break lines in Figure 1b, the accumulated length is reset to zero and a unique number (ID) is assigned to all pixels belonging to the same river reach. It is worth noting that the algorithm selects the upstream pixel with the largest accumulated area (blue squares in Figure 1b) to keep tracing when a junction is found along the flow path. Moreover, if the length threshold is not met at the very upstream (headwater) pixel, the river network at this point is further extended following the pixel with the largest drainage area (dashed line in Figure 1b). It is important to note that the length can be smaller than the predefined value since it is topographically limited by headwater unit-catchment boundaries. Moreover, the extension of the river reach in the most upstream unit-catchment can impact the river length in the headwater areas (where there exist high uncertainties in drainage network definition), however the method is aimed to be applied at large scale hydrodynamic models, and smaller effects are expected in downstream reaches.
All the intermediate outlet pixels over the segmented river (white squares in Figure 1c) are marked as checked. The algorithm then seeks for the next intermediate outlet with the largest drainage area (green square in Figure 1c) that is marked as not checked to proceed with the segmentation for the next tributary. Because the algorithm starts from an intermediate outlet (accumulated length = 0) when tracing in the upstream direction, checked pixels must be neglected to avoid redefinition of some existing reaches. Following this rationale, the above procedure is repeated until the entire river network is completely segmented. Figure 2a shows the result of the segmentation process, and reaches are distinguished by orange and green colors for visualization purposes. Finally, unit-catchments are defined in the same way as in the traditional method, i.e., by identifying all pixels draining to the same river reach (or same ID). Therefore, model computational elements are constrained by topography and river lengths at the same time. Figure 2b shows the spatial discretization of the basin as a result of the above method.

The MGB model
Model description MGB is a conceptual, semi-distributed, hydrological model which has been widely used for large-scale modeling in South America from rapid-response basins to markedly seasonal and often slow response basins Fan et al., 2015Fan et al., , 2017Paiva et al., 2013;Siqueira et al., 2018;Fleischmann et al., 2019). In its most recent version, the basin is divided into unit-catchments (Paiva et al., 2011;Pontes et al., 2017), each one containing a single river reach with an associated floodplain and hydrological vertical water and energy balance. Combinations of soil type and land use within each unit-catchment are categorized as Hydrological Response Units (HRU).
The soil water balance is computed independently for each HRU of each unit-catchment, where surface runoff is generated with excess of storage capacity considering a statistical distribution of water in soil. Canopy interception is represented as a function of Leaf Area Index (based on the approach by Wigmosta et al., 1994) and evapotranspiration is calculated using the Penman-Monteith equation. Groundwater and subsurface flows are computed, respectively, with linear and non-linear functions according to water availability in soil layer. Runoff generated from water balance at each HRU is routed to the stream network using linear reservoirs, considering effects such as attenuation and delay within the unitcatchment. Flow routing in the drainage network is computed using the Muskingum-Cunge method , 1D full hydrodynamic (Paiva et al., 2013) or more recently the inertial approximation of shallow water equations .
Further details about the water balance module of MGB can be found in the general description presented by Collischonn et al. (2007) or by Siqueira et al. (2018).

Flow routing using local inertia equations
Regarding the river routing, flow in natural channels is usually represented by 1D full Saint Venant equations, expressed by continuity (Equation 1) and momentum conservation (Equation 2) equations:  Bates et al. (2010) and further tested by  and Fassoni-Andrade et al. (2018). In this method, the convective acceleration is neglected and flow variables in the friction term (|Q|Q) are written semi-implicitly (|Q t |Q t+∆t ), which in turn can be rearranged and solved with an explicit formulation as shown in Equation 3.
where: g is the acceleration due to gravity [m.s -2 ]; n is the Manning's roughness coefficient; ∆t is the model time step [s]; B is the channel width [m]; Q t and Q t+∆t are, respectively, the flow from previous The only prognostic variable is the total volume stored in channels and floodplains (Equation 4), whereas flow depth and discharge are derived from the total volume after solving the continuity and momentum equations, respectively. Evaporation losses are considered in Equation 4 by assuming flooded areas as open water and applying the Penman equation. Therefore, when flooding occurs in each unit-catchment, the surface area available for soil water balance is reduced proportionally from each HRU.
where V i is the total volume stored in channel and floodplains, for unit-catchment i [m 3 ]; Q in and Q out are, respectively, the inflow and outflow discharge of unit-catchment I [m 3 .s -1 ]; Evq i is the evaporation loss; i P the precipitation over open flooded areas [m 3 .s -1 ]; t and ∆t denote the previous and current time step, respectively.
In order to respect the Courant-Friedrichs-Lewy (CFL) condition, the maximum acceptable time step is adaptive and changes according to maximum water depth, following Equation 5 (Bates et al., 2010): where ∆x [m] is the river reach length; hmax [m] is the maximum water depth in the model domain and; α is a coefficient lower than unity and g is the acceleration due to gravity [m.s -2 ]. In the case of MGB, ∆x corresponds to the reach length chosen for the basin discretization Fleischmann et al., 2019). Further details about the flow routing equations of MGB model can be found in Pontes et al. (2017) and Siqueira et al. (2018).
Finally, regarding the fixed-length discretization described in this study, all tributaries contributing along a given river reach have their outlets artificially moved to the reach upstream point (i.e., green unit-catchments contributing to the blue unit-catchments at the center of Figure 2b).

Parameterization of floodplain topography and river hydraulics
The geometry of channel cross-sections is usually not available for large-scale basins. Hence, a very common approach is to adopt classic hydraulic geometry relationships for specific sites according to drainage area or discharge (Leopold & Maddock, 1953). In the present research we assumed a rectangular channel, where the river cross-section geometry (i.e., bankfull depth and channel width) is parameterized with power law equations (Decharme et al., 2008;Yamazaki et al., 2011;Miguez-Macho & Fan, 2012;Neal et al., 2012;Paiva et al., 2013;Domeneghetti, 2016;Luo et al., 2017).
Riverbed elevation can be roughly estimated using channel depth and riverbank height (i.e., channel top bank), the latter derived from a spaceborne DEM (Paiva et al., 2011;Yamazaki et al., 2011;Pontes et al., 2017). Since river bank height is a key parameter controlling flood frequency and extent (Miguez-Macho & Fan, 2012), noise often present in terrain data should be preferably smoothed to reduce both negative and excessive slopes in the bed profile, which can lead to excessive inundation and numerical instabilities in the model. Therefore, to define channel top bank elevations a smoothing procedure is carried out through a simple linear regression, which is fitted to DEM values within each unitcatchment considering only pixels located over drainage networks. The riverbank height is set as the smoothed elevation associated to the center pixel of the river reach while DEM values remain unchanged . Finally, riverbed elevations are estimated subtracting the channel depths from the bank heights defined above.
To represent floodplain inundation, a hypsometric curve relating flow depth, flooded area and water volume stored in both floodplain and channel for a given unit-catchment is derived from its underlying DEM. Concepts of the HAND model (Rennó et al., 2008) were adopted to compute water volume emulating the inundation process from lower towards nearby higher areas, which is the same approach adopted in CaMa-Flood by Yamazaki et al. (2013). Volumes are calculated through the numerical integration of flooded area with flow depth at each time step.
Since the HAND method allows the whole DEM to be normalized for the drainage network (Rennó et al., 2008), the hypsometric curve becomes insensitive to riverbank height estimation, for instance when applying the smoothing procedure. Nevertheless, it is worth mentioning that DEM conditioning procedures such as raising or lowering pixels (e.g., Yamazaki et al., 2012;Hoch et al., 2017) are not conducted here prior to riverbed and floodplain geometry estimation, so that elevations are maintained as close as possible to the original topographic data.

Study area
The Purus River is one of the main tributaries of the Amazon River ( Figure 3) with a drainage area of 370,000 km 2 and average discharge of 11,000 m 3 s -1 (Paiva et al., 2011). Land cover is mainly forest. Extensive floodplains ("várzeas") exist along the river which lead to major hydrograph attenuation, with the floodplain to channel width ratio reaching values as high as 30 (Melack & Hess, 2010;Junk et al., 2011;Paiva et al., 2011;Fleischmann et al., 2016). Surface water slopes are generally small (< 5 cm/km), and relevant backwater effects occur from the Amazon mainstem (Meade et al., 1991). We choose the Purus basin as the study case for the following reasons: (i) MGB was successfully applied in a previous work with hydrodynamic modeling (Paiva et al., 2011); and (ii) the existence of a strongly seasonal flood pulse and extensive floodplains in the lower portion of the basin.

GIS processing
We performed the GIS processing in two steps. Firstly, D8 flow directions and drainage networks were extracted from the SRTM DEM version 4 (Jarvis et al., 2008) using a GIS package 7/19 called IPH-Hydro Tools (Siqueira et al., 2016). Second, to estimate floodplain topography, we computed the HAND values using drainage networks and flow directions derived above together with the Bare-Earth SRTM DEM version 1, which combines multiple remote sensing datasets in order to reduce vegetation biases on SRTM Data (O'Loughlin et al., 2016).
Spatial resolution of both SRTM and Bare Earth SRTM datasets were resampled from 3 to 15-arcsec in order to reduce computational burden during GIS data processing. We choose to not derive flow directions directly from the Bare-Earth SRTM because the delineation of drainage networks is hampered by the vegetation removal. This is reasonable because in the original DEM the riparian vegetation helps delineating the presence of the water courses, once the elevation values are higher around the river.
The vegetation-removed DEM leads to a flat area along the river floodplain that can hinder a proper river delineation, since the flatter is the terrain, more complicated is to obtain flow directions. However, for other steps in the hydrodynamic model parameterization, especially related to the estimation of floodplain topography, the use of vegetation-removed DEMs is fundamental.
We used a minimum area threshold of 625 km 2 to match the resolution of the Multi-Source Weighted Ensemble Precipitation -MSWEP (Beck et al., 2017) precipitation grid cells (0.25º x 0.25º; see description below).

Land use, soil and hydrological data
The soil map used is a combination of the Brazilian database of soils and the Digitized Soil Map of the World and Derived Soil Properties (Food and Agriculture Organization, 2003), the latter needed to account for areas lying outside the Brazilian national limits. Land use classification was retrieved from the Global Land Cover map (Arino et al., 2012), a product generated using Envisat MERIS fine-resolution (300 m) satellite imagery over the

Meteorological forcing
Two meteorological datasets were used as input to MGB water balance module. The Multi-Source Weighted Ensemble Precipitation -MSWEP (Beck et al., 2017) was used as precipitation input, which is a 0.25º global dataset that optimally combines satellite, gauge and reanalysis data. Climate forcing was derived from the CRU Global Climate v.2 (New et al., 2002), a dataset that provides long term monthly averages (period of 1961-1990) for all land areas at 10' resolution for relative humidity, wind speed, sunlight hours and surface air temperature.

Experimental setup
To understand the effects of spatial discretization on model results we performed the following steps: 1. Model performance: The model was calibrated using in situ observed discharges to provide a representative, reference run for all assessments. In this case, a fixed length discretization of Δx = 10 km was adopted for river reaches within unit-catchments, and α = 0.3 was assumed as the adjustment factor to the routing time step resulting from CFL condition;

Spatial discretization visual inspection and statistical analysis:
Purus basin was spatially discretized in river reaches with  Table 1).
Δx = 5 km, 10 km (default), 30 km and 50 km, each one analyzed through visual inspection and from a statistical perspective; 3. Spatial discretization effects on model results: The model was run using the four different spatial discretizations (hereafter named only as 5 km, 10 km, 30 km and 50 km) and results were intercompared in terms of discharges and water levels. As the purpose of these simulations is to isolate the effect of discretization, model parameters for each run were kept the same as for the default simulation. It is not expected great sensitivity of model parameters to the adopted Δx because the unit-catchment approach respects the topographical limits (as noted by Tesfa et al., 2014); 4. Numerical and computational aspects: Following the previous analyses, all model runs were compared in terms of both numerical stability and computational efficiency, the former evaluated through a mass balance assessment. For each unit-catchment and simulation interval, mass error (M error ) in river storage was calculated based on Equation 4: dV/dt = Q in -Q out + M error , which can be rearranged in the accumulated form (total error) as (Equation 6): where: i is the unit-catchment index; NU is the total number

Flow limiter test:
The MGB model was run using the four spatial discretizations but considering a flow limiter in which the maximum allowed discharge between neighbor unit-catchments is the one that leads to Froude Number v Fr gh equal or less than one. The flow limiter does not imply mass imbalance (e.g., in flooded areas) because it limits only the flow between unit-catchments (i.e., reduce instantaneous discharges), so that the total water is always conserved in the system due to the continuity equation (Equation 1).
All simulations were performed using an Intel i5 2.2Ghz processor with 8GB of RAM. The 10 km reach-length was chosen as the reference value because it is a value commonly used in MGB model previous studies Fleischmann et al., 2019).
Model performance was assessed by three comparison statistics: the Nash-Sutcliffe Efficiency -NSE (Nash & Sutcliffe, 1970); the Kling-Gupta Efficiency -KGE (Kling et al., 2012); and the Delay Index -DI (Paiva et al., 2013). The latter consists in the time delay (in days) that leads to the highest Pearson correlation between observed and the simulated discharges. These performance metrics correspond to a 10-year simulation period from 01/01/2000 to 31/12/2009 (reference run).

Model adjustment
In general, model performance in Purus basin was considered satisfactory (Table 1). For gauges located in smaller drainage areas, KGE values were mainly between 0.6 and 0.9, while NSE values were around 0.55 and 0.75. For larger drainage areas (> 60,000 km 2 ), KGE and NSE values were always greater than 0.85. DI values were all between -1 and 5 days, being among -1 to 1 in 11 of the 14 locations studied.

Spatial discretization visual inspection and statistical analysis
Information regarding the spatial discretization is given by Figure 4 and Table 2. Figure 4a and Figure 4b show a colored visual example of the differences between the discretizations. Despite the larger number of unit-catchments and river reaches in the 5 km (3064) in comparison to the 50 km (400) threshold, the catchment border between tributaries is maintained equal since the resulting irregular grid follows the underlying topography. Therefore, the spatial delineation of unit-catchments is more detailed only for main rivers with respect to downstream/upstream borders, and catchments considered as headwater remain unchanged independently on the adopted Δx (see the one located at the center of boxes in Figure 4a).  Figure 4. (a) Detail of fixed-length discretization (5, 10, 30 and 50 km) within the corresponding box, with river reaches highlighted by different colors; (b) Accumulated frequencies of river bottom slope for each discretization; (c) Histogram of unit-catchment areas for each discretization. Colors in "a" do not refer to colors in "b" and "c". Figure 4c shows the accumulated frequencies of river bottom slope (m/km) resulting from each model discretization. While the 5 km leads to bottom slopes varying from -2.5 m/km to 2.5 m/km, the 50 km smoothens the profiles, with bottom slopes varying from -0.2 m/km to 1.5 m/km. Negative slopes occur because no correction is carried out over the original DEM (i.e., Bare-Earth SRTM), which are more pronounced in river bottom profiles with short lengths since they are more affected by DEM noise in flat areas. As expected, unit-catchment areas are smaller for smaller reach lengths. For example, most of the unit-catchment areas for 5 km and 10 km are smaller than 200 km 2 . In turn, for the 30 km and 50 km discretization, areas were more concentrated between 200 km 2 and 1200 km 2 , respectively. It can be noted that the fixed-length discretization does not lead to the exact Δx previously defined for unitcatchments. This occurs due to aspects related to the DEM resolution (15 arcsec ≈ 450 m), flow direction in adjacent pixels of river networks, especially in the border of unit-catchments (e.g., diagonal direction = 1.414 x DEM resolution), and limitations in headwater river reaches with larger Δx (such as the 50 km), since length is topographically limited by their upstream catchment boundaries.
Both local runoff and inflow of tributaries contributing along a given unit-catchment are added to the very upstream point of this latter unit-catchment. So, the longer the river reach is, the longer will be the total distance for flow propagation. At the same time, increasing reach length also leads to smoothing of river channel slopes. This can reduce the flood wave celerity and cause flows to be attenuated (in conditions without the effect of floodplains). However, differences in total river length were relatively small. Table 3 shows how total river length (down to the basin outlet) can change according to the adopted reach length.

Spatial discretization effects on model results
In this section, impacts of the spatial discretization on model results are evaluated using the same simulation period of the reference run. Results for discharge, flow depth and water level anomalies, at three locations (gauges 1355000 (3760 km 2 ), 13650000 (34,400 km 2 ), and 13880000 (236,000 km 2 )), are given by Figure 5 for the period between 2004 and 2006. To complement the analysis, Figure 6 shows a comparison between maximum and minimum (full period -calibration and validation) calculated discharges at each unit-catchment and for each discretization. For this, we computed the mean of minimum and maximum discharges of independent events that met a pre-defined flow threshold, i.e. (local) low flows below Q 80 (lower values from the flow duration curve) and peak flows higher than Q 20 (higher values from the flow duration curve) of the simulated time series. In general, discharge seems to be not largely affected by the adopted Δx. As an integrative variable that depends on all discharge generated in  the upstream areas, discharges are less sensitive to smaller changes in the volumes of individual reaches. Hydrographs tended to be similar independently of the reach length (Figure 5a), while slightly higher differences appeared on the extreme values. The only exception was observed for minimum discharge values at intermediate drainage areas (Figure 5b). For an area of 2.5×10 5 km 2 the minimum flow varied between 1500 m 3 s -1 (50 km reaches) and 1300 m 3 s -1 (5 km and 1 km reaches).
The differences in minimum flows at intermediate drainage areas can be related to the flow depth ( Figure 5b). As one can see, the flow depth largely differs between the tested reach lengths at the gauge 13650000 (34,400 km 2 ), which is a representative point of intermediate drainage areas. At gauge 13650000, the 50 km reach length simulated flow depth was 10 m lower than the 5 km reach length simulation. With a lower depth, the 50 km simulation has lesser flooding areas, resulting in more in-bank flow during higher discharges, as water does not flow to floodplains in these cases. For this reason, simulated flows with smaller Δx tend to be delayed in comparison to the ones obtained with larger reach lengths, as observed in gauge 13880000.
At other scales these effects are not as important as found in the intermediate scale. In the smaller scale (gauge 1355000 -3760 km 2 ), differences are not much pronounced, and in larger scales (gauge 13880000 236,000 km 2 ) there is a difference of 2 m in the flow depth between discretizations of 50 km/30 km and 10 km/5 km, which is not as expressive as the ones in the intermediate scales and not as expressive on discharges. Likewise, it is interesting to mention that 5 km and 10 km simulation results were also closer to each other in the larger scale, as the 30 km and 50 km were also similar between them.

Numerical and computational aspects
Computational burden and model instabilities related to numerical errors are relevant drawbacks of large-scale hydrodynamic routing. Besides, these problems are somehow linked: despite the adoption of a larger Δx can increase model time step and speedup computations for explicit schemes governed by the CFL condition (e.g., local inertial formulation), flow routing is more subject to mass balance errors because numerical issues usually arise from larger time steps. In this section we assess the impact of discretization on model efficiency and mass conservation to identify a tradeoff between both characteristics. Moreover, as the time step of the local inertial scheme computed by CFL condition can be adjusted by the α coefficient (Equation 7), we perform additional simulations with increasing α values (0.4, 0.5, 0.6 and 0.7) to assess the relevance of this parameter.
In order to provide a suitable comparison regarding numerical stability, total mass error (M error described in Equation 6) was first expressed in terms of a mean discharge dividing M error [m 3 ] by the number of simulation time steps, thus producing an estimate of the flow error Q error [m 3 s -1 ]. Further, the ratio between Q error and the mean discharge of Purus basin at its outlet point was computed for each Δx and α value, and results were plotted against their respective simulation times (Figure 7).
According to results, the difference in processing times between discretizations can be considered relevant. While both simulations using 50 km and 30 km discretizations were completed in less than one minute even for lower α values, the simulation time for the 10 km reach varied between 2.5 (α = 0.7) and 6 minutes (α = 0.3). Also, the simulation time using the 5 km reach was around five times slower than the 10 km one, reaching a maximum simulation time of almost half an hour (α = 0.3). Other than expected, model runs using different α values for the same Δx did not follow a linear pattern with respect to computational cost. This can be clearly seen for Δx = 5 km, since there is a major time difference in simulations with α between 0.3 and 0.4 when compared to the time difference with α between 0.6 and 0.7. Regarding mass balance, the adoption of the lowest α (0.3) led to errors less than 0.01% (10 -4 ) in all simulations. As α increases, errors also increase, reaching almost 5000% when using Δx = 50 km and α = 0.7. On the other hand, although the 5 km simulations resulted in relatively small errors, the associated simulation times were always the largest ones.
To further assess the impacts of the selected α value on model instabilities, boxplots of the Froude number (Fr) for each previous run were plotted in Figure 8. Each boxplot contains the maximum Fr value (considering the entire simulation period) of each unitcatchment for a given discretization, separated by different α values. As we can see, simulations with 5, 10, 30 and 50 km discretizations produce Fr values higher than unity (i.e., supercritical flows and associated numerical instabilities) for α values ≥ 0.7, 0.6, 0.5 and 0.4, respectively. To some extent, this agrees with results previously shown in Figure 7, albeit Fr > 1 not always led to significant mass errors (less than 0.1% [10 -3 ] for limit values of α, except for the 50 km discretization). This indicates that numerical instabilities are likely to occur in some specific reaches, but errors are not necessarily propagated to other parts of the basin. Figure 9 shows the evolution of model time step in the simulation period considering an α value of 0.7. It is important  to highlight that the model time step is adaptive and calculated at each time interval by Equation 5, as flow depths are constantly changing over time. It can be noted that the 5 km simulation resulted in an almost constant Δt (around 100 seconds). These low and nearly constant values occur because a small Δx leads to more noisy river bottom profiles. Noisy bottom profiles can produce "artificial reservoirs" that increase flow depth at corresponding unit-catchments (see, for instance, gauge 13650000 in Figure 5), so that higher "h" values are used to compute Δt through CFL condition (Equation 5), leading to lower Δt values. As the relative variation in flow depth tends to be low in cases with "artificial reservoirs", it consequently forces the model to keep a small time step for the entire simulation. In contrast, model time step in the 50 km simulation seems to be more adaptive to changes in flow depths (variations of Δt between 1000 and 1500 seconds), since bottom profiles are smoothed according to reach length, which results in lower water depths. That suggests that DEM correction methods with respect to noise reduction can be useful to reduce model time step in hydrodynamic simulations.

Flow limiter test
Previous tests indicated that the simulations can present numerical instabilities that lead to mass balance errors, even when using constant river reach lengths and adaptive time steps to control the CFL condition (Equation 7). These numerical issues were more relevant in simulations with longer river reaches (e.g., 50 km), which are also the most computationally efficient ones. As longer reach lengths (e.g., 50 km) would allow the best computational times, one interesting question is how to reduce their numerical instabilities in order to be able to adopt them in large scale models.
Then, in a final test we evaluated the usage of a simple flow limiter, based on the Froude number, by assessing the resulted oscillations and mass errors. The MGB model was run using the four tested spatial discretizations, using α values of 0.3 and 0.7, and considering a flow limiter in which the maximum allowed discharge between unit-catchments is the one in which the instantaneous Froude Number is equal to one. The adopted flow limiter led to almost zero mass errors, i.e. it was able to reduce mass error instabilities. However, although with no relevant mass error, some hydrograph oscillations persisted (visual inspection) in the simulated discharges with α = 0.7, while none was observed in the scenario with α = 0.3. Then, Figure 10 presents a spatial analysis of where these oscillations occurred in the scenarios with α = 0.7 when using the Froude flow limiter. Since the scenario with α = 0.3 did not present oscillations, we applied a simple method to evaluate in which reach occurred oscillations by computing the correlation (Pearson correlation) between discharges simulated with α equal to 0.3 and discharges simulated with α equal to 0.7 discharges. Oscillations in river reaches were classified into the following categories: no oscillations (correlation higher than 0.99), intermediate (correlation between 0.75 and 0.99) and large (correlation lower than 0.75) oscillations.
From our criteria, the usage of a flow limiter resulted in simulations with almost no discharge oscillations using α = 0.7 and reaches of 5 km and 10 km. The 30 km simulation had some intermediate and large oscillations mainly at upstream reaches due to higher slopes, while the 50 km simulation led to a considerable number of oscillations over all the Purus basin, with a higher number of occurrences in the middle-upstream region.
From our perspective, a Froude number-based flow limiter helps to avoid mass balance errors and oscillations in general, but does not totally eliminate them, especially when a longer reach is used in the discretization. Our results show that a ∆x of 30 km with the proposed Froude number-based flow limiter could be satisfactorily adopted to simulate the Purus river basin.

CONCLUSIONS
With the proposal of further understanding the effects of river network discretization on large-scale hydrologic-hydrodynamic models, the present research provided insights about relevant issues in setting up model configurations prior to a given application. Most of the insights were related to models which use the explicit local inertial approximation of shallow water equations (Bates et al., 2010) to perform flow routing, which is a usual setup for state-ofthe-art regional and continental models. Based on the obtained results, we consider that the main contributions of the study are: • A fixed-length, vector-based discretization method can easily connect hydrological models that are built over topographical divisions (i.e., unit-catchments, sub-basins) to a hydrodynamic routing module; • Anomalies of water level can be reasonably well simulated between different values of Δx, but this is not valid for flow depths. Therefore, a DEM correction method to reduce noise in river bottom profiles (or methods to ensure a correct profile representation) should be preferred if simulation of water surface elevations (or flood extent) is desired; • For the explicit local inertial routing, Δx must be cautiously chosen together with a time step correcting factor (α) for CFL condition. Although more efficient in terms of processing time, a larger Δx is more subject to supercritical flows and numerical oscillations if small values of α are not adopted. Smaller Δx values demand more processing time, but present less numerical issues even for higher values of α. For large-scale hydrological-hydrodynamic modeling encompassing entire basins, the use of low α values (such as 0.3) is recommended; • A simple flow limiter based on Froude number can help to avoid numerical issues, but is not enough to eliminate flow oscillations, especially for the case of larger Δx and higher values of α.
Finally, and summarizing all the contributions, we can state that a smaller α of the CFL condition had less impact in the total processing time for longer Δx and was satisfactorily applied to control the model stability.