SciELO - Scientific Electronic Library Online

vol.22 issue1Effect of wave frequency on the nonlinear interaction between Görtler vortices and three-dimensional Tollmien-Schlichting wavesAn experimental/numerical study of the turbulent boundary layer development along a surface with a sudden change in roughness author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand



Related links


Journal of the Brazilian Society of Mechanical Sciences

Print version ISSN 0100-7386

J. Braz. Soc. Mech. Sci. vol.22 n.1 Rio de Janeiro  2000 

Turbulent Shallow-Water Model for Orographic Subgrid-Scale Perturbations


Norberto Mangiavacchi
Alvaro L. G. A. Coutinho
Nelson F. F. Ebecken
Center for Parallel Computations - COPPE/Federal University of Rio de Janeiro - PO Box 68506, RJ 21945-970, Rio de Janeiro, Brazil
e-mail: norberto,,



A parallel pseudo-spectral method for the simulation in distributed memory computers of the shallow-water equations in primitive form was developed and used on the study of turbulent shallow-waters LES models for orographic subgrid-scale perturbations. The main characteristics of the code are: momentum equations integrated in time using an accurate pseudo-spectral technique; Eulerian treatment of advective terms; and parallelization of the code based on a domain decomposition technique. The parallel pseudo-spectral code is efficient on various architectures. It gives high performance onvector computers and good speedup on distributed memory systems.
The code is being used for the study of the interaction mechanisms in shallow-water ows with regular as well as random orography with a prescribed spectrum of elevations. Simulations show the evolution of small scale vortical motions from the interaction of the large scale flow and the small-scale orographic perturbations. These interactions transfer energy from the large-scale motions to the small (usually unresolved) scales. The possibility of including the parametrization of this effects in turbulent LES subgrid-stress models for the shallow-water equations is addressed.
Keywords: Turbulent flows, shallow-water equations, subgrid-scale models, orographic perturbations, parallel processing, spectral methods




The shallow-water model is used as a testbench for understanding many fundamental dynamical problems, such as atmospheric flows, tides, storm surges, river and coastal flows. In many such applications, the flow is mostly a two-dimensional turbulent flow.

Numerical simulations usually are performed using Large-Eddies Simulation (LES) techniques, which require the parametrization of scales smaller than the computational grid, which is done by subgrid-scale models. The best known subgrid-scale model for LES of turbulent ows is the Smagorinsky model [1]. Later models, such as Germano[2], and Lilly[3] overcome some of Smagorinsky model limitations, in particular the need to specify a parameter, and the tendency of beeing over-dissipative.

None of these models, however, address the effects of the perturbations introduced by an irregular orography with scales smaller than the computational grid. Small scale orographic perturbations play an important role in the dynamic cascade in turbulent shallow-water flows. Neglecting these effect can result in degraded forecast capabilities in atmospheric simulations. Therefore, better subgrid-scale models, including subgrid-scale orographic effects, may result in significantly improved weather forecasts.

Numerical simulations of the detailed behaviour of turbulent shallow-water flows in the presence of small scale orographic perturbations can provide a fundamental understanding of the mechanisms involved in such flows, and provide valuable data for the development and validation of new sub-grid models. Such simulations require the space and time accurate integration of the turbulent flow field and additionally model accurately the effects due to the orography.

Since such simulations are very computationally intensive, higher resolution shallow- water flow simulations could profit from the scalable performance available on parallel arquitectures. Pseudo-spectral methods, due to their high accuracy and performance when used in simple domains, and their intrinsic parallelism, can be applied successfully for the simulation of shallow-water flows in distributed memory computers. In this work we describe a new parallel pseudo-spectral code designed to perform high resolution, space and time accurate simulation of shallow-water flows on various current distributed memory architectures. The parallel algorithm explores the intrinsic parallelism of the pseudo-spectral method and it is based on a domain decomposition approach. The remainder of this paper is organized as follows. In the next section we briefly review the governing equations of shallow-water flows. The section that follows details our pseudo-spectral method, based on Fourier expansions. The next section presents the parallel implementation, designed to achieve high performance, while retaining portability across different platforms. The following section shows the numerical results for a test case and discusses the parallel performance of the code on several computers, and results of several simulations of the shallow-water equations with various orographic perturbations are discussed. Finally, the paper ends with a summary of the main conclusions of this work.


Governing Equations

The governing equations are the shallow-water equations, also known as Barré de Saint-Venant equations. The derivation is given, for example, by Lesieur[4]. Here we will restate the main assumptions.

We start from the Navier-Stokes equations locally on a sphere, with a fluid of constant uniform density r . The fluid is assumed to have a free surface with elevation H(x, y, t) above a reference plane, and to lie above a topography of height hs(x, y). The depth of the fluid layer is h(x; y; t), and the surface elevation is H = h + hs (see Figure 1).


a07fig01.gif (8616 bytes)


The pressure at the free surface is uniform and equal to p0. The pressure is hydrostatically distributed along the vertical, and the horizontal velocity field v, depends only on the horizontal space variables x and y, and on the time. Integrating the equations along the vertical, one obtains

a07frm01.gif (3416 bytes)

a07frm02.gif (2000 bytes)

where a07img01.gif (1199 bytes), is the vorticity, f = 2W sin(q) is the projection of the planetary vorticity 2W , w = êW ê , and q is the local latitude.


The Numerical Method

We use a Fourier Pseudo-Spectral Method in order to discretize the system of partial differential equations into ordinary differential equations. The pseudo-spectral approach avoids the high cost of computing quadratic and higher terms in Fourier representation, which require convolutions, but still preserving the Spectral Convergence properties of standard spectral methods. The Pseudo-spectral approach gives the best computational cost/benefit for simple geometries, in particular for periodic domains.

We proceed using the expansion of the flow variables in Fourier series in the (x) and (y) horizontal directions,

a07frm03.gif (3152 bytes)

The thickness h is updated using

a07frm04.gif (3395 bytes)

The time integration for the transport equation is performed using operator splitting. The velocity field is solved using Adams-Bashfort for the advection terms,

a07frm05.gif (3086 bytes)


a07frm06.gif (2665 bytes)

and Cranck-Nickolson for the viscous terms,

a07frm07.gif (3581 bytes)

To be able to perform high resolution simulations, which are required to perform direct numerical simulations, the code was parallelized using a domain decomposition approach, which is described next.


Parallel Algorithm

The parallel algorithm explores intrinsic parallelism of the pseudo-spectral technique. It is based on a domain decomposition approach in latu sensu. Three kinds of operations are involved in the parallel pseudo-spectral method: (1) computation of products in physical space, (2) inversion of the Poisson operators, computation of derivatives, and filtering operations in Fourier space, and (3) computation of the discrete two-dimensional Fourier transforms. The first two kinds of operations can be performed without any communication, when the domain is partitioned among the processors, in physical space and in Fourier space respectively. The only part that requires communication is the computation of the the discrete two-dimensional Fourier transforms. Here also, for portability, we have chosen to use the transposition approach, that allows to use highly optimized single processor one-dimensional FFT routines, that are normally found in most architectures, and a transposition algorithm that can be easily ported to different distributed memory architectures.

The transposition algorithm used in the FFT is the so called direct transposition algorithm where at each stage of the communication algorithm each node sends to his pair all the data that has that node as final destination. At each stage the processor pairs are defined using a mapping of the processors onto a hypercube, and a relative addressing strategy. The number of stages is p-1 where p is the number of processors. The transposition algorithm can be summarized as follows:

  • For i =1, 2, ..., p-1 do:

Each node m collects and sends to n = XOR(m, i) all blocks that have node n as final destination, and replaces them with the blocks that receives from n.

  • Unshuffle the resulting data in each node.

Here XOR stands fo the exclusive or boolean operation. The Poisson operator is diagonal in Fourier space. After decomposition of the domain among the nodes, the Poisson operator will require O (N 2/p) computations per node, at each time step, and no communications. Here N is the resolution in each dimension. The two-dimensional FFT's will require O ((2N 2 log2 (N)/p) computations and (p-1) bidirectional communications of length N 2/p2 when using the described transposition algorithm. Since the latency time is much shorter than the communication time for large problem sizes, the total communication time is essentially O (N 2 / p2). Therefore the ratio of communications to computations is O(1/log2 (N)). Except for a few global operations, all the communication are lumped in the transposition algorithm making it easily portable to other distributed memory computers. The implementation using PVM, as the one given in [5], can run on a CRAY T3D and an IBM SP2 without changes in the transposition algorithms, only requiring to load the appropriate machine dependent single-processor one-dimensional FFT routines. To achieve better performance on the SP2, versions employing the MPI and MPL libraries were also implemented with minor additional effort, and considerable improvement in the performance.


Single Node and Parallel Performance

To evaluate the performance of the code in parallel machines, two-dimensional computations were performed with resolutions ranging from 256x256 to 1024x1024.

High single-processor performance is obtained across various platforms using the provided optimized high performance libraries for the FFTs. The first and third cases in table 1 SP2 wide using FORTRAN and ESSL library respectively) show that single-processor performance in the IBM SP2 can be improved by a factor of 5 by using the ESSL library. On CRAY T3D implementation, the scalar FFT library subroutines CCFFT, SCFFT, and CSFFT were used. On the CRAY J90 and T90 version, the vector library FFT subroutines CFFTMLT, SCFFTM, and CSFFTM where used. Parallel performance was measured on a 4-processor IBM SP2. and on a 32-processor CRAY T3D. When using the PVM libraries on the SP2, the speed-up curves show that the parallel efficiency drops to about 75% for two processors. This is caused by the transposition algorithm, which is not present in the single processor case, and that introduces a significant overhead even when using only 2 processors. However efficiency continues above 50% even for quite larger numbers of nodes, as long as the problem size is adequate. When using the dedicated MPL library on the SP2 we obtain some improvement in the efficiency (on two processors about 80% for 256 x256), showing that the communication cost is reduced by using the MPL library. In fact, the efficiency is about 50% even when using 32 T3D nodes for the case 1024 x1024. Hence, a CRAY T3D with 32 nodes outperforms a single CPU of a CRAY J90 running in vector mode. The scalability of the parallel code clearly indicates its potential to beat more powerful vector processors, as the T90, if more processors were added to the parallel machine.


a07tab01.gif (17046 bytes)


a07fig02.gif (8458 bytes)



For the scope of this work we will analize the case of negligible rotation (f = 0). The following simulation were performed using a physical space resolution of 64 by 64 points, unless otherwise specified.


Test problem

A number of runs were performed to assess the correctness of the code. One usual test for shallow-water solvers is the dam-break problem. In this problem, there is sharp variation in the surface elevation H, in otherwise quiescent initial conditions. The elevation is H = H0 at the left, and H = H1 at the right of the dam. From this initial sharp front, two waves propagate in opposite directions, at two different speeds. The wave travelling in the shallower side is a shock wave, while the wave propagating on the deep side is a rarefaction wave. In this test it is assessed the ability to accurately simulate the propagation of sharp waves on the surface elevation. A typical result from this test is shown in figure 3.


a07fig03.gif (13751 bytes)


A two-dimensional extension to the dam-break problem was also simulated, in order to verify if the code accurately accounts for more complex two-dimensional wave interactions. Results are shown in Figure 4.


a07fig04.gif (67853 bytes)


Regular Orography

Some simulations were performed using regular orographic perturbations. In such simulations the orography is given by sinusoidal perturbations of the form

Two different kinds of initial conditions were analised. On the first case, shown in Figure 5, a single vortex with a scale larger than the orographic perturbation is left to evolve.


a07fig05.gif (85545 bytes)


In the abscence of the orographic perturbations, the vortex undergoes a slow decay, since the backgroung viscosity is very small. In the presence of the orographicperturbation the vortex develops small-scale peaks, which are originated from interactions between the vorticity field and the orography. The occurrence of the peaks can be explained simply by conservation of potential vorticity (w + f) / h. When passing over deeps, h increases causing the increase of vorticity by vortex stretching, and the opposite occours over the hights.

On a second case, the initial conditions are given by a turbulent flow, as shown in Figure 6.


a07fig06.gif (95005 bytes)


In this case, the interaction cause a faster rate of decay of the turbulent motions with scales close to the scale of the orographic perturbation than the case without perturbation.

The effect of the orographic perturbation can be better visualized looking at the energy spectra E(k) of the two flows, shown in Figure 7. Two kinds of effects can be seen. The first most ostensive effect is the forcing of modes close to the perturbation scale a07img02.gif (1056 bytes) and its harmonics. The forcing causes the peaks observed in the spectrum at discrete points. The other effect is the reduction of energy at scales away from the forcing frequencies, in particular for the large scales (small k). This is the most important effect as far as parametrization of subrid-scale perturbations is concerned.


a07fig07.gif (24197 bytes)


Irregular Orography

A more complex behaviour can be expected in the case that the orography has a continuous spectrum of perturbations. Two particular cases are of interest. In the first case the spectrum of perturbation is similar to the spectrum of the velocity, while in the second case the spectrum of the orographic perturbation is peaked at higher modes. The elevation spectra for this two cases are shown in Figure 8, and the resulting spatial distributions are shown in Figure 9.


a07fig08.gif (13720 bytes)


a07fig09.gif (57310 bytes)


The resulting kinetic energy spectra for the turbulent flow without disturbace, and for the two types of terrains are shown in Figure 10. While for the terrain 1 the effects of the orographic perturbations are already quite considerable, the effects due to terrain 2 are certainly dramatic, leading to a much higher kinetic energy level at the small scales, and a lower energy level at the large scales.


a07fig10.gif (16216 bytes)


Considerations on Parametrization of Orographic Subgrid-scale Perturbations

From the results of the previous simulations it has been shown that the effect of the subgrid orographic perturbations is not a simple dissipative mechanism, acting instead in the full range of dynamic scales. The transfer of energy among scales is enhanced, and, in the case of of orography with a elevation spectrum containing perturbations with high energy at small scales, there is strong transfer of kinetic energy from large to small scales.

No simple dissipative model can reproduce this kind of mechanism with accuracy. In order to take into account these effects in large eddies simulations, one crude approach is to add a perturbation at the smallest resolved scales with the same energy than the subgrid scale perturbations. A very promising approach, however is to use upscaling techniques to accurately compute the energy transfer due to the smallest scales, and to use this information in the large scale model.



A method to solve two-dimensional shallow-water flows using pseudo-spectral methods was developed. Results of simulations using the method show physically consistent results. Based on this method, a high performance parallel pseudo-spectral method for the simulation of two-dimensional shallow-water flows was developed. The code is designed to perform high resolution, space and time accurate simulations of shallow-water flows on various distributed memory architectures. The parallel Pseudo-Spectral code is efficient on various architectures. It gives good speedup on distributed memory systems (IBM SP2 and T3D).

Simulations performed with the code show that the interaction mechanisms between orography and the vorticity field lead to the development of small scale vortical motions that have a faster decay rate, at the expense of the energy of the large scales.

This kind of interaction cannot be accurately represented by a simple dissipative model, requiring more refined techniques such as upscaling.



We gratefully acknowledge the Center of Parallel Computations of the Federal University of Rio de Janeiro (NACAD-COPPE) for providing us time on the IBM SP2. and CRAY J90 machines. We are also indebted to Silicon Graphics/Cray Research Division by the computer time in their CRAY T3D and T90 machines at Eagan, MN, USA. The first author is currently supported by a Research Grant from FAPERJ (E-26/151.018/97).



Smagorinsky. J., General circulation experiments with the primitive equations. Mon.Weath. Rev., 91, 99-164 (1963).        [ Links ]

Germano, M., Piomelli, U., Moin, P. and Cabot. W. C., A dynamical subgrid-scale eddy viscosity model. Phys. Fluids A, 3, 1760-1765 (1991).        [ Links ]

Lilly, D. K., A proposed modification to the germano subgrid scale closure method. Phys. Fluids A, 4, 663 (1993).        [ Links ]

Lesieur, M., Turbulence in Fluids. Kluwer Academic Publishers, 2nd edition, (1990).        [ Links ]

Mangiavacchi, N., Coutinho, A. L. G. A and Ebecken, N. F. F., Parallel pseudo-spectral computations at NACAD. Proc. of SBAC-PAD, 1 (1997).        [ Links ]



Article presented at the 1st Brazilian School on Transition and Turbulence, Rio de Janeiro, September 21-25, 1998.
Technical Editor: Atila P. Silva Freire.