Journal of the Brazilian Society of Mechanical Sciences
Print version ISSN 01007386
J. Braz. Soc. Mech. Sci. vol.22 n.1 Rio de Janeiro 2000
http://dx.doi.org/10.1590/S010073862000000100007
Turbulent ShallowWater Model for Orographic SubgridScale Perturbations
Norberto Mangiavacchi
Alvaro L. G. A. Coutinho
Nelson F. F. Ebecken
Center for Parallel Computations  COPPE/Federal University of Rio de Janeiro  PO Box 68506, RJ 21945970, Rio de Janeiro, Brazil
email: norberto, alvaro@coc.ufrj.br, nelson@ntt.ufrj.br
A parallel pseudospectral method for the simulation in distributed memory computers of the shallowwater equations in primitive form was developed and used on the study of turbulent shallowwaters LES models for orographic subgridscale perturbations. The main characteristics of the code are: momentum equations integrated in time using an accurate pseudospectral technique; Eulerian treatment of advective terms; and parallelization of the code based on a domain decomposition technique. The parallel pseudospectral code is efficient on various architectures. It gives high performance onvector computers and good speedup on distributed memory systems.
The code is being used for the study of the interaction mechanisms in shallowwater ows with regular as well as random orography with a prescribed spectrum of elevations. Simulations show the evolution of small scale vortical motions from the interaction of the large scale flow and the smallscale orographic perturbations. These interactions transfer energy from the largescale motions to the small (usually unresolved) scales. The possibility of including the parametrization of this effects in turbulent LES subgridstress models for the shallowwater equations is addressed.
Keywords: Turbulent flows, shallowwater equations, subgridscale models, orographic perturbations, parallel processing, spectral methods
Introduction
The shallowwater model is used as a testbench for understanding many fundamental dynamical problems, such as atmospheric flows, tides, storm surges, river and coastal flows. In many such applications, the flow is mostly a twodimensional turbulent flow.
Numerical simulations usually are performed using LargeEddies Simulation (LES) techniques, which require the parametrization of scales smaller than the computational grid, which is done by subgridscale models. The best known subgridscale model for LES of turbulent ows is the Smagorinsky model [1]. Later models, such as Germano[2], and Lilly[3] overcome some of Smagorinsky model limitations, in particular the need to specify a parameter, and the tendency of beeing overdissipative.
None of these models, however, address the effects of the perturbations introduced by an irregular orography with scales smaller than the computational grid. Small scale orographic perturbations play an important role in the dynamic cascade in turbulent shallowwater flows. Neglecting these effect can result in degraded forecast capabilities in atmospheric simulations. Therefore, better subgridscale models, including subgridscale orographic effects, may result in significantly improved weather forecasts.
Numerical simulations of the detailed behaviour of turbulent shallowwater flows in the presence of small scale orographic perturbations can provide a fundamental understanding of the mechanisms involved in such flows, and provide valuable data for the development and validation of new subgrid models. Such simulations require the space and time accurate integration of the turbulent flow field and additionally model accurately the effects due to the orography.
Since such simulations are very computationally intensive, higher resolution shallow water flow simulations could profit from the scalable performance available on parallel arquitectures. Pseudospectral methods, due to their high accuracy and performance when used in simple domains, and their intrinsic parallelism, can be applied successfully for the simulation of shallowwater flows in distributed memory computers. In this work we describe a new parallel pseudospectral code designed to perform high resolution, space and time accurate simulation of shallowwater flows on various current distributed memory architectures. The parallel algorithm explores the intrinsic parallelism of the pseudospectral method and it is based on a domain decomposition approach. The remainder of this paper is organized as follows. In the next section we briefly review the governing equations of shallowwater flows. The section that follows details our pseudospectral method, based on Fourier expansions. The next section presents the parallel implementation, designed to achieve high performance, while retaining portability across different platforms. The following section shows the numerical results for a test case and discusses the parallel performance of the code on several computers, and results of several simulations of the shallowwater equations with various orographic perturbations are discussed. Finally, the paper ends with a summary of the main conclusions of this work.
Governing Equations
The governing equations are the shallowwater equations, also known as Barré de SaintVenant equations. The derivation is given, for example, by Lesieur[4]. Here we will restate the main assumptions.
We start from the NavierStokes equations locally on a sphere, with a fluid of constant uniform density r . The fluid is assumed to have a free surface with elevation H(x, y, t) above a reference plane, and to lie above a topography of height h_{s}(x, y). The depth of the fluid layer is h(x; y; t), and the surface elevation is H = h + h_{s} (see Figure 1).
The pressure at the free surface is uniform and equal to p_{0}. The pressure is hydrostatically distributed along the vertical, and the horizontal velocity field v, depends only on the horizontal space variables x and y, and on the time. Integrating the equations along the vertical, one obtains
where , is the vorticity, f = 2W sin(q) is the projection of the planetary vorticity 2W , w = êW ê , and q is the local latitude.
The Numerical Method
We use a Fourier PseudoSpectral Method in order to discretize the system of partial differential equations into ordinary differential equations. The pseudospectral approach avoids the high cost of computing quadratic and higher terms in Fourier representation, which require convolutions, but still preserving the Spectral Convergence properties of standard spectral methods. The Pseudospectral approach gives the best computational cost/benefit for simple geometries, in particular for periodic domains.
We proceed using the expansion of the flow variables in Fourier series in the (x) and (y) horizontal directions,
The thickness h is updated using
The time integration for the transport equation is performed using operator splitting. The velocity field is solved using AdamsBashfort for the advection terms,
where
and CranckNickolson for the viscous terms,
To be able to perform high resolution simulations, which are required to perform direct numerical simulations, the code was parallelized using a domain decomposition approach, which is described next.
Parallel Algorithm
The parallel algorithm explores intrinsic parallelism of the pseudospectral technique. It is based on a domain decomposition approach in latu sensu. Three kinds of operations are involved in the parallel pseudospectral method: (1) computation of products in physical space, (2) inversion of the Poisson operators, computation of derivatives, and filtering operations in Fourier space, and (3) computation of the discrete twodimensional Fourier transforms. The first two kinds of operations can be performed without any communication, when the domain is partitioned among the processors, in physical space and in Fourier space respectively. The only part that requires communication is the computation of the the discrete twodimensional Fourier transforms. Here also, for portability, we have chosen to use the transposition approach, that allows to use highly optimized single processor onedimensional FFT routines, that are normally found in most architectures, and a transposition algorithm that can be easily ported to different distributed memory architectures.
The transposition algorithm used in the FFT is the so called direct transposition algorithm where at each stage of the communication algorithm each node sends to his pair all the data that has that node as final destination. At each stage the processor pairs are defined using a mapping of the processors onto a hypercube, and a relative addressing strategy. The number of stages is p1 where p is the number of processors. The transposition algorithm can be summarized as follows:

For i =1, 2, ..., p1 do:
Each node m collects and sends to n = XOR(m, i) all blocks that have node n as final destination, and replaces them with the blocks that receives from n.

Unshuffle the resulting data in each node.
Here XOR stands fo the exclusive or boolean operation. The Poisson operator is diagonal in Fourier space. After decomposition of the domain among the nodes, the Poisson operator will require O (N^{ 2}/p) computations per node, at each time step, and no communications. Here N is the resolution in each dimension. The twodimensional FFT's will require O ((2N^{ 2} log_{2} (N)/p) computations and (p1) bidirectional communications of length N^{ 2}/p^{2} when using the described transposition algorithm. Since the latency time is much shorter than the communication time for large problem sizes, the total communication time is essentially O (N^{ 2} / p^{2}). Therefore the ratio of communications to computations is O(1/log_{2} (N)). Except for a few global operations, all the communication are lumped in the transposition algorithm making it easily portable to other distributed memory computers. The implementation using PVM, as the one given in [5], can run on a CRAY T3D and an IBM SP2 without changes in the transposition algorithms, only requiring to load the appropriate machine dependent singleprocessor onedimensional FFT routines. To achieve better performance on the SP2, versions employing the MPI and MPL libraries were also implemented with minor additional effort, and considerable improvement in the performance.
Single Node and Parallel Performance
To evaluate the performance of the code in parallel machines, twodimensional computations were performed with resolutions ranging from 256x256 to 1024x1024.
High singleprocessor performance is obtained across various platforms using the provided optimized high performance libraries for the FFTs. The first and third cases in table 1 SP2 wide using FORTRAN and ESSL library respectively) show that singleprocessor performance in the IBM SP2 can be improved by a factor of 5 by using the ESSL library. On CRAY T3D implementation, the scalar FFT library subroutines CCFFT, SCFFT, and CSFFT were used. On the CRAY J90 and T90 version, the vector library FFT subroutines CFFTMLT, SCFFTM, and CSFFTM where used. Parallel performance was measured on a 4processor IBM SP2. and on a 32processor CRAY T3D. When using the PVM libraries on the SP2, the speedup curves show that the parallel efficiency drops to about 75% for two processors. This is caused by the transposition algorithm, which is not present in the single processor case, and that introduces a significant overhead even when using only 2 processors. However efficiency continues above 50% even for quite larger numbers of nodes, as long as the problem size is adequate. When using the dedicated MPL library on the SP2 we obtain some improvement in the efficiency (on two processors about 80% for 256 x256), showing that the communication cost is reduced by using the MPL library. In fact, the efficiency is about 50% even when using 32 T3D nodes for the case 1024 x1024. Hence, a CRAY T3D with 32 nodes outperforms a single CPU of a CRAY J90 running in vector mode. The scalability of the parallel code clearly indicates its potential to beat more powerful vector processors, as the T90, if more processors were added to the parallel machine.
Results
For the scope of this work we will analize the case of negligible rotation (f = 0). The following simulation were performed using a physical space resolution of 64 by 64 points, unless otherwise specified.
Test problem
A number of runs were performed to assess the correctness of the code. One usual test for shallowwater solvers is the dambreak problem. In this problem, there is sharp variation in the surface elevation H, in otherwise quiescent initial conditions. The elevation is H = H_{0} at the left, and H = H_{1} at the right of the dam. From this initial sharp front, two waves propagate in opposite directions, at two different speeds. The wave travelling in the shallower side is a shock wave, while the wave propagating on the deep side is a rarefaction wave. In this test it is assessed the ability to accurately simulate the propagation of sharp waves on the surface elevation. A typical result from this test is shown in figure 3.
A twodimensional extension to the dambreak problem was also simulated, in order to verify if the code accurately accounts for more complex twodimensional wave interactions. Results are shown in Figure 4.
Regular Orography
Some simulations were performed using regular orographic perturbations. In such simulations the orography is given by sinusoidal perturbations of the form
Two different kinds of initial conditions were analised. On the first case, shown in Figure 5, a single vortex with a scale larger than the orographic perturbation is left to evolve.
In the abscence of the orographic perturbations, the vortex undergoes a slow decay, since the backgroung viscosity is very small. In the presence of the orographicperturbation the vortex develops smallscale peaks, which are originated from interactions between the vorticity field and the orography. The occurrence of the peaks can be explained simply by conservation of potential vorticity (w + f) / h. When passing over deeps, h increases causing the increase of vorticity by vortex stretching, and the opposite occours over the hights.
On a second case, the initial conditions are given by a turbulent flow, as shown in Figure 6.
In this case, the interaction cause a faster rate of decay of the turbulent motions with scales close to the scale of the orographic perturbation than the case without perturbation.
The effect of the orographic perturbation can be better visualized looking at the energy spectra E(k) of the two flows, shown in Figure 7. Two kinds of effects can be seen. The first most ostensive effect is the forcing of modes close to the perturbation scale and its harmonics. The forcing causes the peaks observed in the spectrum at discrete points. The other effect is the reduction of energy at scales away from the forcing frequencies, in particular for the large scales (small k). This is the most important effect as far as parametrization of subridscale perturbations is concerned.
Irregular Orography
A more complex behaviour can be expected in the case that the orography has a continuous spectrum of perturbations. Two particular cases are of interest. In the first case the spectrum of perturbation is similar to the spectrum of the velocity, while in the second case the spectrum of the orographic perturbation is peaked at higher modes. The elevation spectra for this two cases are shown in Figure 8, and the resulting spatial distributions are shown in Figure 9.
The resulting kinetic energy spectra for the turbulent flow without disturbace, and for the two types of terrains are shown in Figure 10. While for the terrain 1 the effects of the orographic perturbations are already quite considerable, the effects due to terrain 2 are certainly dramatic, leading to a much higher kinetic energy level at the small scales, and a lower energy level at the large scales.
Considerations on Parametrization of Orographic Subgridscale Perturbations
From the results of the previous simulations it has been shown that the effect of the subgrid orographic perturbations is not a simple dissipative mechanism, acting instead in the full range of dynamic scales. The transfer of energy among scales is enhanced, and, in the case of of orography with a elevation spectrum containing perturbations with high energy at small scales, there is strong transfer of kinetic energy from large to small scales.
No simple dissipative model can reproduce this kind of mechanism with accuracy. In order to take into account these effects in large eddies simulations, one crude approach is to add a perturbation at the smallest resolved scales with the same energy than the subgrid scale perturbations. A very promising approach, however is to use upscaling techniques to accurately compute the energy transfer due to the smallest scales, and to use this information in the large scale model.
Conclusions
A method to solve twodimensional shallowwater flows using pseudospectral methods was developed. Results of simulations using the method show physically consistent results. Based on this method, a high performance parallel pseudospectral method for the simulation of twodimensional shallowwater flows was developed. The code is designed to perform high resolution, space and time accurate simulations of shallowwater flows on various distributed memory architectures. The parallel PseudoSpectral code is efficient on various architectures. It gives good speedup on distributed memory systems (IBM SP2 and T3D).
Simulations performed with the code show that the interaction mechanisms between orography and the vorticity field lead to the development of small scale vortical motions that have a faster decay rate, at the expense of the energy of the large scales.
This kind of interaction cannot be accurately represented by a simple dissipative model, requiring more refined techniques such as upscaling.
Acknowledgements
We gratefully acknowledge the Center of Parallel Computations of the Federal University of Rio de Janeiro (NACADCOPPE) for providing us time on the IBM SP2. and CRAY J90 machines. We are also indebted to Silicon Graphics/Cray Research Division by the computer time in their CRAY T3D and T90 machines at Eagan, MN, USA. The first author is currently supported by a Research Grant from FAPERJ (E26/151.018/97).
References
Smagorinsky. J., General circulation experiments with the primitive equations. Mon.Weath. Rev., 91, 99164 (1963). [ Links ]
Germano, M., Piomelli, U., Moin, P. and Cabot. W. C., A dynamical subgridscale eddy viscosity model. Phys. Fluids A, 3, 17601765 (1991). [ Links ]
Lilly, D. K., A proposed modification to the germano subgrid scale closure method. Phys. Fluids A, 4, 663 (1993). [ Links ]
Lesieur, M., Turbulence in Fluids. Kluwer Academic Publishers, 2^{nd} edition, (1990). [ Links ]
Mangiavacchi, N., Coutinho, A. L. G. A and Ebecken, N. F. F., Parallel pseudospectral computations at NACAD. Proc. of SBACPAD, 1 (1997). [ Links ]
Article presented at the 1st Brazilian School on Transition and Turbulence, Rio de Janeiro, September 2125, 1998.
Technical Editor: Atila P. Silva Freire.