<b>Parallel Implementation of a Two-level Algebraic ILU(k)-based Domain Decomposition Preconditioner</b>

NIEVINSKI, I.C.L.; SOUZA, M.; GOLDFELD, P.; AUGUSTO, D.A.; RODRIGUES, J.R.P.; CARVALHO, L.M.

doi:10.5540/tema.2018.019.01.0059

ABSTRACT

We discuss the parallel implementation of a two-level algebraic ILU(k)-based domain decomposition preconditioner using the PETSc library. We present strategies to improve performance and minimize communication among processes during setup and application phases. We compare our implementation with an off-the-shelf preconditioner in PETSc for solving linear systems arising in reservoir simulation problems, and show that for some cases our implementation performs better.

Keywords:
Two-level preconditioner; domain decomposition; Krylov methods; linear systems; parallelism; PETSc

RESUMO

Discutimos a implementação paralela de um precondicionador algébrico de decomposição o de domínios em dois níveis baseado em ILU(k), utilizando a biblioteca PETSc. Apresentamos estratégias para melhorar a performance, minimizando a comunicação entre processos durante as fases de construção e de aplicação. Comparamos, na solução de sistemas lineares provenientes de problemas de simulação de reservatórios, a nossa implementação com um precondicionador padrão do PETSc. Mostramos que, para alguns casos, nossa implementação apresenta um desempenho superior.

Palavras-chave:
Precondicionador de dois níveis; decomposição de domínio; métodos de Krylov; sistemas lineares; paralelismo; PETSc

1 INTRODUCTION

This paper discusses the formulation and the parallel implementation of an algebraic ILU(k)-based two-level domain decomposition preconditioner first introduced in²2. D. A. Augusto, L. M. Carvalho, P. Goldfeld, I. C. L. Nievinski, J.R.P. Rodrigues, & M. Souza. An algebraic ILU(k) based two-level domain decomposition preconditioner. In Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 3 (2015), 010093-1-010093-7..

In this work we present and discuss details of the implementation using the MPI-based PETSc suite³3. S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B.F. Smith, S. Zampini, H. Zhang, & H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory, (2016)., a set of data structures and routines for the parallel solution of scientific applications modeled by partial differential equations. We also present results of computational experiments involving matrices from oil reservoir simulation. We have tested with different number of processes and compared the results with the default PETSc preconditioner, block Jacobi, which is a usual option in the oil industry.

The multilevel preconditioner has been an active research area for the last 30 years. One of the main representatives of this class is the algebraic multigrid (AMG) method¹⁹19. A. Muresan & Y. Notay. Analysis of aggregation-based multigrid. SIAM Journal on Scientific Computing, 30(2) (2008), 1082-1103.^{), (}²²22. Y. Notay. Algebraic multigrid and algebraic multilevel methods: a theoretical comparison. Numer. Linear Algebra Appl, 12(5-6) (2005), 419-451.^{), (}³⁰30. J. Tang, R. Nabben, C. Vuik & Y. Erlangga. Theoretical and numerical comparison of various projection methods derived from deflation, domain decomposition and multigrid methods. Reports of the Department of Applied Mathematical Analysis 07-04, Delft University of Technology, January 2007.^{), (}³¹31. U. Trottenberg, C. Oosterlee & A. Schuller. Multigrid. Academic Press, Inc. Orlando, USA, (2001)., when used as a preconditioner rather than as a solver. It is widely used in oil reservoir simulation as part of the CPR (constrained pressure residual) preconditioner⁶6. H. Cao, H. Tchelepi, J. Wallis & H. Yardumian. Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In SPE Annual Technical Conference and Exhibition, 9-12 October 2005, Dallas, Texas. Society of Petroleum Engineers, (2005).^),(³³33. J. Wallis, R. Kendall & T. Little. Constrained residual acceleration of conjugate residual methods. In SPE Reservoir Simulation Symposium, number SPE 13563 in 8 th Symposium on Reservoir Simulation, Dallas, 1985. Society of Petroleum Engineers.. Despite its general acceptance there is room for new alternatives, as there are problems where AMG can perform poorly, see, for instance¹³13. S. Gries, K. Stüben, G. L. Brown, D. Chen & D. A. Collinns. Preconditioning for efficiently applying algebraic multigrid in fully implicit reservoir simulations. SPE Journal, 19(04): 726-736, August 2014. SPE-163608-PA.. Among these alternatives we find the two-level preconditioners. There are many variations within this family of preconditioners, but basically we can discern at least two subfamilies: algebraic¹1. T. M. Al-Shaalan, H. Klie, A.H. Dogru, & M. F. Wheeler. Studies of robust two stage preconditioners for the solution of fullyimplicit multiphase flow problems. In SPE Reservoir Simulation Symposium, 2-4 February, The Woodlands, Texas,number SPE-118722 in MS, (2009).^{), (}⁴4. M. Bollhöfer & V. Mehrmann. Algebraic multilevel methods and sparse approximate inverses. SIAM Journal on Matrix Analysis and Applications, 24(1) (2002), 191-218.^{), (}¹²12. L. Formaggia & M. Sala. Parallel Computational Fluids Dynamics. Practice and Theory., chapter Algebraic coarse grid operators for domain decomposition based preconditioners. Elsevier, (2002), 119-126.^{), (}¹⁴14. H. Jackson, M. Taroni & D. Ponting. A two-level variant of additive Schwarz preconditioning for use in reservoir simulation. arXiv preprint arXiv: 1401.7227, (2014).^{), (}¹⁸18. A. Manea, J. Sewall & H. Tchelepi. Parallel multiscale linear solver for reservoir simulation. Lecture Notes - slides - 4th SESAAI Annual Meeting Nov 5, 2013, (2013).^{), (}²⁰20. R. Nabben & C. Vuik. A comparison of deflation and coarse grid correction applied to porous media flow. SIAM J. Numer. Anal., 42 (2004), 1631-1647.^{), (}²³23. Y. Notay. Algebraic analysis of two-grid methods: The nonsymmetric case. Numerical Linear Algebra with Applications, 17(1) (2010), 73-96.^{), (}²⁸28. Y. Saad & J. Zhang. BILUM: Block versions of multielimination and multilevel ILU preconditioner for general sparse linear systems. SIAM Journal on Scientific Computing, 20(6) (1999), 2103-2121.^{), (}³²32. C. Vuik & J. Frank. Coarse grid acceleration of a parallel block preconditioner. Future Generation Computer Systems, JUN, 17(8) (2001), 933-940. and operator dependent¹⁵15. P. Jenny, S. Lee & H. Tchelepi. Multi-scale finite-volume method for elliptic problems in subsurface flow simulation. Journal of Computational Physics, 187(1) (2003), 47- 67.^{), (}¹⁶16. P. Jenny, S. Lee & H. Tchelepi. Adaptive fully implicit multi-scale finite-volume method for multi-phase flow and transport in heterogeneous porous media. Journal of Computational Physics, 217(2) (2006), 627-641.^{), (}¹⁷17. P. Jenny, S. H. Lee & H. A. Tchelepi. Adaptive multiscale finite-volume method for multiphase flow and transport in porous media. Multiscale Modeling & Simulation, 3(1) (2005), 50-64.. Within the operator dependent preconditioners we should highlight the spectral methods²¹21. F. Nataf, H. Xiang & V. Dolean. A two level domain decomposition preconditioner based on local Dirichlet-to-Neumann maps. Comptes Rendus Mathematique, 348(21-22) (2010), 1163-1167.^{), (}²⁵25. A. Quarteroni & A. Valli. Applied and Industrial Mathematics: Venice - 1, 1989, chapter Theory and Application of Steklov-Poincaré Operators For Boundary-Value Problems. Kluwer, (1991), 179-203..

Incomplete LU factorization ILU(k)²⁶26. Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2nd edition, (2003). has long been used as a preconditioner in reservoir simulation (an ingenious parallel implementation is discussed in¹¹11. D. A. Collins, J.E. Grabenstetter & P. H. Sammon. A Shared-Memory Parallel Black-Oil Simulator with a Parallel ILU Linear Solver. SPE-79713-MS. In SPE Reservoir Simulation Symposium, Houston, 2003. Society of Petroleum Engineers.. Due to the difficulty in parallelizing ILU(k), it is quite natural to combine ILU(k) and block-Jacobi, so much so that this combination constitutes PETSc’s default parallel preconditioner³3. S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B.F. Smith, S. Zampini, H. Zhang, & H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory, (2016).. The algorithm proposed in²2. D. A. Augusto, L. M. Carvalho, P. Goldfeld, I. C. L. Nievinski, J.R.P. Rodrigues, & M. Souza. An algebraic ILU(k) based two-level domain decomposition preconditioner. In Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 3 (2015), 010093-1-010093-7., whose parallel implementation we discuss in this article, seeks to combine the use of (sequential) ILU(K) with two ideas borrowed from domain decomposition methods: (i) the introduction of an interface that connects subdomains, allowing, as opposed to block-Jacobi, for the interaction between subdomains to be taken into account, and (ii) the introduction of a second level, associated to a coarse version of the problem, that speeds up the resolution of low frequency modes. These improvements come at the cost of greater communication, requiring a more involved parallel implementation.

The main contribution of the two-level preconditioner proposed in²2. D. A. Augusto, L. M. Carvalho, P. Goldfeld, I. C. L. Nievinski, J.R.P. Rodrigues, & M. Souza. An algebraic ILU(k) based two-level domain decomposition preconditioner. In Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 3 (2015), 010093-1-010093-7. is the fine preconditioner, as the coarse level component is a quite simple one. Accordingly, the main contribution in our article is the development of a careful parallel implementation of that fine part; nonetheless, we also take care of the coarse part by proposing low-cost PETSc-based parallel codes for its construction and application. Besides the parallel implementation, we present and discuss a set of performance tests of this preconditioner for solving synthetic and real-world oil reservoir simulation problems.

The paper is organized as follows. Section 2 introduces the notation used throughout the paper. Section 3 describes in detail the proposed two-level algebraic ILU(k)-based domain decomposition preconditioner, which we call iSchur. Its parallel implementation is detailed in Section 4, including the communication layout and strategies to minimize data transfer between processes, while results of performance experiments and comparisons with another preconditioner are presented and discussed in Section 5. The conclusion is drawn in Section 6 along with some future work directions.

2 NOTATION

In this section, we introduce some notation that will be necessary to define iSchur.

Consider the linear system of algebraic equations

A x = b

(2.1)

arising from the discretization of a system of partial differential equations (PDEs) modeling the multiphase flow in porous media by a finite difference scheme with n gridcells. We denote by n _dof the number of degrees of freedom (DOFs) per gridcell, so that A is a matrix of dimension n _dof n × n _dof n. For instance, for the standard black-oil formulation, see²⁴24. D. W. Peaceman Fundamentals of Numerical Reservoir Simulation. Elsevier, (1977)., the DOFs are oil pressure, oil saturation and water saturation, so that n _dof = 3.

It will be convenient to think of A as a block-matrix:

A = [\begin{matrix} a_{11} & a_{12} & \dots & a_{1 n} \\ a_{21} & a_{22} & \dots & a_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{n 1} & a_{n 2} & \dots & a_{n n} \end{matrix}],

where a block, denoted a _ij , is a matrix of dimension n _dof × n _dof.

The block-sparsity pattern of A is determined by the stencil of the finite-difference scheme used for the discretization of the PDEs. Figure 1 depicts a bidimensional domain discretized by a 3×3 mesh and the sparsity pattern of the associated matrix for the black-oil model, assuming a five-point finite-difference stencil. We say that two gridcells i and j are neighbors if either one of the blocks a _ij or a _ji is not null. For the five-point stencil, gridcells are neighbors when they share an edge.

Figure 1:
Discretization grid and associated matrix, assuming three unknowns per gridcell and a five-point stencil.

The gridcells and their respective indices are identified, in a way that Ω denotes either the domain of the PDE or the set of indices {1,2,…,n}. We introduce a disjoint partition of Ω, i.e., we break Ω ={1,2,…,n} into P pieces Ω₁, Ω₂,..., Ω_p , in such a way that each gridcell belongs to exactly one subdomain Ω_k , see Figure 2. More precisely, we define

{Ω_{J}}_{1 \leq J \leq P} \begin{matrix}  \end{matrix} s .t . \begin{matrix}  \end{matrix} \cup_{J = 1}^{P} Ω_{J} = Ω \begin{matrix}  \end{matrix} and \begin{matrix}  \end{matrix} Ω_{I} \cap Ω_{J} = 0 \forall I \neq J .

(2.2)

Figure 2:
2D Domain partitioned into 4 subdomains.

We note that there are gridcells that, while belonging to one subdomain, have neighbors in other subdomains. Our domain decomposition approach requires the definition of disconnected subdomains, that can be dealt with in parallel. For that sake, we define a separator set, called interface and denoted by Γ, and (disconnected) subdomain interiors $Ω_{J}^{I n t}$ . A gridcell j is said to belong to Γ if it is in a subdomain Ω_J while being neighbor of at least one gridcell in another subdomain with greater index, i.e., Ω_K with K > J:

Γ = {j \in Ω | \exists k, J, K \begin{matrix}  \end{matrix} s .t . \begin{matrix}  \end{matrix} j \in Ω_{J}, k \in Ω_{K}, K > J, (a_{j k} \neq 0 \begin{matrix}  \end{matrix} o r \begin{matrix}  \end{matrix} a_{k j} \neq 0)} .

(2.3)

We now define the subdomain interior $Ω_{J}^{I n t}$ as the portion of Ω_J not in Γ:

Ω_{J}^{I n t} = Ω_{J} - Γ,

(2.4)

see Figure 2. These definitions are just enough to ensure that if gridcell j is in $Ω_{J}^{I n t}$ and gridcell k is in $Ω_{K}^{I n t}$ , with J ≠ K, then they are not neighbors.¹ 1 Had the condition K>J been dropped from definition (2.3),  would still be a separator set. But it would be unnecessarily large, which would yield a more expensive preconditioner. Indeed, to fix the notation, assume J < K. Since $j \in Ω_{J}^{I n t} \subset Ω_{J}$ and $k \in Ω_{K}^{I n t} \subset Ω_{K}$ , if j and k were neighbors, j would be in Γ by definition (2.3) and therefore not in $Ω_{J}^{I n t}$ .

We now define the local interface Γ_J associated with each subdomain Ω_J as the intersection of Γ and Ω_J , or equivalently,

Γ_{J} = {j \in Ω_{J} | (\exists K > J \begin{matrix}  \end{matrix} a n d \begin{matrix}  \end{matrix} \exists k \in Ω_{K}) s .t . (a_{j k} \neq 0 \begin{matrix}  \end{matrix} o r \begin{matrix}  \end{matrix} a_{k j} \neq 0)} .

(2.5)

Notice that {Γ_J }_{1≤ J ≤ p} form a disjoint partition of Γ. See Figure 2.

Finally, we define extended subdomains ${\bar{Ω}}_{J}$ and extended local interfaces ${\bar{Γ}}_{J}$ , which incorporate the portions of the interface connected to the subdomain interior $Ω_{J}^{I n t}$ . We define ${\bar{Ω}}_{J}$ as

{\bar{Ω}}_{J} = Ω_{J} \cup {k \in Γ | \exists j \in Ω_{J}^{I n t} \begin{matrix}  \end{matrix} s .t . \begin{matrix}  \end{matrix} (a_{j k} \neq 0 \begin{matrix}  \end{matrix} o r \begin{matrix}  \end{matrix} a_{k j} \neq 0)} .

(2.6)

and ${\bar{Γ}}_{J}$ as its restriction to Γ, i.e., ${\bar{Γ}}_{J} = Γ \cap {\bar{Ω}}_{J}$ , see Figure 2.

Notice that $Γ_{J} \subset {\bar{Γ}}_{J} \subset Γ$ . We point out that ${\bar{Γ}}_{J}$ is the result of augmenting Γ_J with the gridcells of the interface Γ that are neighbors of gridcells in Ω_j . We refer to ${\bar{Γ}}_{J}$ as an extended interface, see Figure 2.

If the equations/variables are reordered, starting with the ones corresponding to $Ω_{1}^{I n t}$ , followed by the other $Ω_{J}^{I n t}$ and finally by Γ, then A has the following block-structure:

A = [\begin{matrix} A_{11} & A_{1 Γ} \\ ⋱ & ⋮ \\ A_{P P} & A_{P Γ} \\ A_{Γ 1} & \dots & A_{Γ P} & A_{Γ Γ} \end{matrix}],

(2.7)

where the submatrices A _JJ contain the rows and columns of A associated with $Ω_{J}^{I n t}$ , A _ΓΓ the ones associated with Γ, A _JΓ the rows associated with $Ω_{J}^{I n t}$ and the columns associated with Γ, and A _ΓJ the rows associated with Γ and the columns associated with $Ω_{J}^{I n t}$ . Therefore, denoting by |S| the number of elements in a set S, the dimensions of A _JJ , A _ΓΓ, A _JΓ and A _ΓJ are, respectively, n _dof n _J × n _dof n _J, n _dof m× n _dof m, n _dof n _J × n _dof m, and n _dof m× n _dof n _J , where $n_{J} = | Ω_{J}^{I n t} |$ and m=|Γ|. It is important to notice that, since a _jk = 0 for any $j \in Ω_{J}^{I n t}$ and $k \in Ω_{K}^{I n t}$ with J ≠ K, the submatrices A _JK of rows associated with $Ω_{J}^{I n t}$ and columns associated with $Ω_{K}^{I n t}$ are all null (and therefore omitted in (2.7)). This block-diagonal structure of the leading portion of the matrix (which encompasses most of the variables/equations) allows for efficient parallelization of many tasks, and was the ultimate motivation of all the definitions above. We point out that although not necessary by the method, the implementation discussed in this work assumes that the matrix A is structurally symmetric² 2 If A is structurally symmetric, then A T and A have the same nonzero structure but are not necessarily equal. , in this case the matrices A _JJ and A _ΓΓ are also structurally symmetric. Furthermore, A _JΓ and $A_{Γ J}^{T}$ have the same nonzero pattern.

3 DESCRIPTION OF THE TWO-LEVEL PRECONDITIONER

The goal of a preconditioner is to replace the linear system (2.1) by one of the equivalent ones:

\begin{array}{l} (M A) x = M b \begin{matrix}  \end{matrix} (left preconditioned) or \\ (A M) y = b \begin{matrix}  \end{matrix} (right preconditioned, where x = M y) . \end{array}

A good preconditioner shall render AM or MA much better conditioned than A (i.e., M should approximate, in a sense,A ^-1) and, in order to be of practical interest, its construction and application must not be overly expensive.

We now present our preconditioner, which combines ideas from domain decomposition methods, DDM, and level-based Incomplete LU factorization, ILU(k). In DDM, one tries to build an approximation to the action of A ^-1 based on the (approximation of) inverses of smaller, local versions of A (subdomain-interior submatrices A _JJ , in our case). In this work, the action of the local inverses is approximated by ILU(k _Int), while other components required by the preconditioner (see equation (3.4)) are approximated with different levels (k _Bord, k _Prod or k _Γ). The motivation for this approach is that ILU(k) has been widely used with success as a preconditioner in reservoir simulation and DDM has been shown to be an efficient and scalable technique for parallel architectures.

It is well established that in linear systems associated with parabolic problems, the high-frequency components of the error are damped quickly, while the low-frequency ones take many iterations to fade, see³¹31. U. Trottenberg, C. Oosterlee & A. Schuller. Multigrid. Academic Press, Inc. Orlando, USA, (2001).. In reservoir simulation, the pressure unknown is of parabolic nature. A two-level preconditioner tackles this problem by combining a “fine” preconditioner component M _F , as the one mentioned in the previous paragraph, with a coarse component M _C , the purpose of which is to annihilate the projection of the error onto a coarse space (associated with the low frequencies). If the two components are combined multiplicatively (see²⁹29. B. Smith, P. Bjørstad & W. Gropp. Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, New York, 1st edition, (1996).), the resulting two-level preconditioner is

M = M_{F} + M_{C} - M_{F} A M_{C} .

(3.1)

In Subsection 3.1 we present a ILU(k)-based fine component M _F and in Subsection 3.2 we describe a simple coarse component.

3.1 ILU(k) based Domain Decomposition

The fine component of the domain decomposition preconditioner is based on the following block LU factorization of A:

A = L U = [\begin{matrix} L_{1} \\ ⋱ \\ L_{P} \\ B_{1} & \dots & B_{P} & I \end{matrix}] [\begin{matrix} U_{1} & C_{1} \\ ⋱ & ⋮ \\ U_{P} & C_{P} \\ S \end{matrix}],

(3.2)

where A _JJ = L _J U _J is the LU factorization of A _JJ , $B_{J} = A_{Γ J} U_{J}^{- 1}, C_{J} = L_{J}^{- 1} A_{J Γ}$ and

S = A_{Γ Γ} - \sum_{J = 1}^{P} A_{Γ J} A_{J J}^{- 1} A_{J Γ} = A_{Γ Γ} - \sum_{J = 1}^{P} B_{J} C_{J},

(3.3)

is the Schur complement of A with respect to the interior points.

From the decomposition in (3.2), we can show that the inverse of A is

A^{- 1} = [\begin{matrix} U_{1}^{- 1} \\ ⋱ \\ U_{P}^{- 1} \\ I \end{matrix}] [\begin{matrix} C_{1} \\ I & ⋮ \\ C_{P} \\ - I \end{matrix}] [\begin{matrix} I \\ S^{- 1} \end{matrix}] [\begin{matrix} I \\ B_{1} & \dots & B_{P} & - I \end{matrix}] [\begin{matrix} L_{1}^{- 1} \\ ⋱ \\ L_{P}^{- 1} \\ I \end{matrix}] .

(3.4)

We want to define a preconditioner M _F approximating the action of A ^-1 on a vector. Therefore, we need to define suitable approximations for $L_{J}^{- 1}, U_{J}^{- 1}$ , B _J and C _J , J=1,…,P, and for S ^-1. These approximations are denoted by ${\tilde{L}}_{J}^{- 1}, {\tilde{U}}_{J}^{- 1}, {\tilde{B}}_{J}, {\tilde{C}}_{J}$ , and $S_{F}^{- 1}$ , respectively. The fine preconditioner is then defined as

M_{F} = [\begin{matrix} {\tilde{U}}_{1}^{- 1} \\ ⋱ \\ {\tilde{U}}_{P}^{- 1} \\ I \end{matrix}] [\begin{matrix} {\tilde{C}}_{1} \\ I & ⋮ \\ {\tilde{C}}_{P} \\ - I \end{matrix}] [\begin{matrix} I \\ S_{F}^{- 1} \end{matrix}] [\begin{matrix} I \\ {\tilde{B}}_{1} & \dots & {\tilde{B}}_{P} & - I \end{matrix}] [\begin{matrix} {\tilde{L}}_{1}^{- 1} \\ ⋱ \\ {\tilde{L}}_{P}^{- 1} \\ I \end{matrix}] .

(3.5)

In the remaining of this subsection, we describe precisely how these approximations are chosen.

First we define ${\tilde{L}}_{J}$ and ${\tilde{U}}_{j}$ as the result of the incomplete LU factorization of A _JJ with level of fill k _Int, $[{\tilde{L}}_{J}, {\tilde{U}}_{J}] = I L U (A_{J J}, k_{I n t})$ . In our numerical experiments, we used k _Int = 1, which is a usual choice in reservoir simulation. Even though ${\tilde{L}}_{J}$ and ${\tilde{U}}_{J}$ are sparse, ${\tilde{L}}_{J}^{- 1} A_{J Γ}$ and ${\tilde{U}}_{J}^{- T} A_{Γ J}^{T}$ (which would approximate C _J and $B_{J}^{T}$ ) are not. To ensure sparsity, ${\tilde{C}}_{J} \approx {\tilde{L}}_{J}^{- 1} A_{J Γ}$ is defined as the result of the incomplete triangular solve with multiple right-hand sides of the system ${\tilde{L}}_{J} {\tilde{C}}_{J} = A_{J Γ}$ , by extending the definition of level of fill as follows. Let v _l and w _l be the sparse vectors corresponding to the l-th columns of A _JΓ and $\tilde{C}$ respectively, so that ${\tilde{L}}_{J} w_{l} = v_{l}$ . Based on the solution of triangular systems by forward substitution, the components of v _l and w _l are related by

w_{l_{k}} = v_{l_{k}} - \sum_{i = 1}^{k - 1} {\tilde{L}}_{J_{k i}} w_{l_{i}} .

(3.6)

The level of fill-in of component k of w _l is defined recursively as

L e v (w_{l_{k}}) = \min {L e v (v_{l_{k}}), \min_{1 \leq i \leq (k - 1)} {L e v ({\tilde{L}}_{J_{k i}}) + L e v (w l_{i}) + 1}},

(3.7)

where $L e v (v_{l_{k}}) = 0$ when $v_{l_{k}} \neq 0$ and $L e v (v_{l_{k}}) = \infty$ otherwise, and $L e v ({\tilde{L}}_{J_{k i}})$ is the level of fill of entry ki in the ILU(k _Int) decomposition of A _JJ when ${\tilde{L}}_{J_{k i}} \neq 0$ and $L e v ({\tilde{L}}_{J_{k i}}) = \infty$ otherwise. The approximation ${\tilde{C}}_{J}$ to $C_{J} = L_{J}^{- 1} A_{J Γ}$ is then obtained by an incomplete forward substitution (IFS) with level of fill k _Bord, in which we do not compute any terms with level of fill greater than k _Bord during the forward substitution process. We denote ${\tilde{C}}_{J} = I F S ({\tilde{L}}_{J}, A_{J_{Γ}}, k_{B o r d})$ . Notice that when k _Bord = k _Int, ${\tilde{C}}_{J}$ is what would result from a standard partial incomplete factorization of A. The approximation ${\tilde{B}}_{J}$ to $B_{J} = A_{Γ J} {\tilde{U}}_{J}^{- 1}$ is defined analogously.

Similarly, in order to define an approximation $\tilde{S}$ to S, we start by defining $F_{J} = {\tilde{B}}_{J} {\tilde{C}}_{J}$ and a level of fill for the entries of F _J ,

L e v (F_{J_{k l}}) = \min {L e v (A_{Γ Γ_{k l}}), \min_{1 \leq i \leq m} {L e v ({\tilde{B}}_{J_{k i}}) + L e v ({\tilde{C}}_{J_{i l}}) + 1}},

(3.8)

where $m = # Ω_{J}^{I n t}$ is the number of columns in ${\tilde{B}}_{J}$ and rows in ${\tilde{C}}_{J}, L e v (A_{Γ Γ_{k l}}) = 0$ when $A_{Γ Γ_{k l}} \neq 0$ and $L e v (A_{Γ Γ_{k l}}) = \infty$ , otherwise, and $L e v ({\tilde{C}}_{J_{k i}})$ is the level of fill according to definition (3.7) when ${\tilde{C}}_{J_{k i}} \neq 0$ and $L e v (C_{J_{k i}}) = \infty$ , otherwise ( $L e v ({\tilde{B}}_{J_{i l}})$ is defined in the same way). Next, ${\tilde{F}}_{J}$ is defined as the matrix obtained keeping only the entries in F _J with level less than or equal to k _Prod according to (3.8). We refer to this incomplete product(IP) as ${\tilde{F}}_{J} = I P ({\tilde{B}}_{J}, {\tilde{C}}_{J}, k_{\Pr o d})$ .

$\tilde{S}$ is then defined as

\tilde{S} = A_{Γ Γ} - \sum_{J = 1}^{P} {\tilde{F}}_{J} .

(3.9)

While $\tilde{S}$ approximates S, we need to define an approximation for S ^-1. Since $\tilde{S}$ is defined on the global interface Γ, it is not practical to perform ILU on it. Instead, we follow the approach employed in⁷7. L. M. Carvalho, L. Giraud & P. L. Tallec. Algebraic two-level preconditioners for the Schur complement method, SIAM Journal on Scientific Computing, 22(6) (2001), 1987-2005. and define for each subdomain a local version of $\tilde{S}$ ,

{\tilde{S}}_{J} = R_{J} \tilde{S} R_{J}^{T},

(3.10)

where $R_{J} : Γ \to {\bar{Γ}}_{J}$ is a restriction operator such that ${\tilde{S}}_{J}$ is the result of pruning $\tilde{S}$ so that only the rows and columns associated with ${\bar{Γ}}_{J}$ remain. More precisely, if ${i_{1}, i_{2},..., i_{n_{{\bar{Γ}}_{J}}}}$ is a list of the nodes in Γ that belong to ${\bar{Γ}}_{J}$ , then the k-th row of R _J is $e_{i_{k}}^{T}$ , the i _k -th row of the n _Γ × n _Γ identity matrix,

R_{J} = [\begin{matrix} e_{i_{1}}^{T} \\ ⋮ \\ e_{i_{n_{Γ}}}^{T} \end{matrix}] .

Finally, our approximation $S_{F}^{- 1}$ to S ^-1 is defined as

S_{F}^{- 1} = \sum_{J = 1}^{P} T_{J} {(L_{{\tilde{S}}_{J}} U_{{\tilde{S}}_{J}})}^{- 1} R_{J} \approx \sum_{J = 1}^{P} T_{J} {\tilde{S}}_{J}^{- 1} R_{J},

(3.11)

where $L_{{\tilde{S}}_{J}}$ and $U_{{\tilde{S}}_{J}}$ are given by ILU(k _Γ) of ${\tilde{S}}_{J}$ . Here $T_{J} : {\bar{Γ}}_{J} \to Γ$ is an extension operator that takes values from a vector that lies in ${\bar{Γ}}_{J}$ , scales them by $w_{1}^{J},..., w_{n_{{\bar{Γ}}_{J}}}^{J}$ (which we call weights), and places them in the corresponding position of a vector that lies in Γ. Therefore, using the same notation as before, the k-th column of T _J is $w_{k}^{J} e_{i_{k}}$ ,

T_{J} = [w_{1}^{J} e_{i_{1}} \begin{matrix}  \end{matrix} \dots \begin{matrix}  \end{matrix} w_{n_{Γ}}^{J} e_{i_{n_{Γ}}}] .

Different choices for the weights gives rise to different options for T _J , our choice is one that avoids communication among processes and is defined as follows.

w_{k}^{J} = {\begin{matrix} 1, if the i_{k} -th node of Γ belongs to Γ_{J} \begin{matrix}  \end{matrix} \\ 0, if the i_{k} -th node of Γ belongs to {\bar{Γ}}_{J} - Γ_{J} . \end{matrix}

We note that T _J ,J=1,...,P, form a partition of unity. Our approach is based on the Restricted Additive Schwarz method, proposed on⁵5. X. Cai & M. Sarkis. A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems SIAM Journal on Scientific Computing, 21(2) (1999), 792-797..

3.2 Coarse Space Correction

We define a coarse space spanned by the columns of an (n×P) matrix that we call $R_{0}^{T}$ . The J-th column of $R_{0}^{T}$ is associated with the extended subdomain ${\bar{Ω}}_{J}$ and its i-th entry is

{(R_{0}^{T})}_{i J} = {\begin{matrix} 0, if node i is not in {\bar{Ω}}_{J} and \\ μ_{Ω_{i}}^{- 1}, if node i is in {\bar{Ω}}_{J}, \end{matrix}

where the i-th entry of µ _Ω, denoted $μ_{Ω_{i}}$ , counts how many extended subdomains the i-th node belongs to. Notice that ${(R_{0}^{T})}_{i J} = 1 \begin{matrix} \end{matrix} \forall i \in Ω_{J}^{I n t}$ and that the columns of $R_{0}^{T}$ form a partition of unity, in the sense that their sum is a vector with all entries equal to 1.

We define M _C by the formula

M_{C} = R_{0}^{T} {(R_{0} A R_{0}^{T})}^{- 1} R_{0} .

(3.12)

Notice that this definition ensures that M _C A is a projection onto range $(R_{0}^{T})$ and for A symmetric positive definite this projection is A-orthogonal. Since $R_{0} A R_{0}^{T}$ is small (P×P), we use exact LU (rather than ILU) when applying its inverse.

Summing up, the complete preconditioner has two components: one related to the complete grid, called M _F in equation (3.5), and another related to the coarse space (3.12). It is possible to subtract a third term that improves the performance of the whole preconditioner, although increasing its computational cost. The combined preconditioners are written down as

M = M_{F} + M_{C} or

(3.13)

M = M_{F} + M_{C} - M_{F} A M_{C} .

(3.14)

This formulation implies that the preconditioners will be applied additively (3.13) or multiplicatively (3.1) and can be interpreted as having two levels, see⁷7. L. M. Carvalho, L. Giraud & P. L. Tallec. Algebraic two-level preconditioners for the Schur complement method, SIAM Journal on Scientific Computing, 22(6) (2001), 1987-2005.. In the following sections we address this two-level algebraic ILU(k)-based domain decomposition preconditioner by iSchur (for “incomplete Schur”).

4 PARALLEL IMPLEMENTATION

In this section we discuss the parallel implementation of the preconditioner, considering data locality, communication among processes, and strategies to avoid communication.

We use distributed memory model, so the matrix A is distributed among the processes and, as we use PETSc, the rows corresponding to the elements of each subdomain, including interior and local interfaces, reside in the same process, see Figure 3 for a four domain representation.

Figure 3:
Subdomain and matrix distribution among processes.

4.1 Preconditioner Setup

Algorithm 1 describes the construction of the fine and coarse components of the preconditioner.

Algorithm 1:
Preconditioner Setup

Step 1 of Algorithm 1 is the incomplete LU factorization. Because each subdomain is completely loaded in only one process and there is no connections between subdomain interiors, this step can be done in parallel in each process without communication. We use a native and optimized PETSc routine that computes the incomplete LU factorization.

In Step 2, ${\tilde{C}}_{J} = I F S ({\tilde{L}}_{J}, A_{J Γ},0)$ computes an approximation ${\tilde{C}}_{J} = {\tilde{L}}^{- 1} A_{J Γ}$ through a level zero incomplete triangular solver applied to the columns of A _JΓ . For each column a _i of A _JΓ , we take ℐ _i , the set of indices of the nonzero elements of a _i . Then we solve $c_{i} (I_{i}) = \tilde{L} (I_{i}, I_{i}) \ a_{i} (I_{i})$ , which is a small dense triangular system, where c _i is the i-th column of ${\tilde{C}}_{J}$ . See Figure 4. The light gray rows and columns represent the ℐ _i indices applied to the rows and columns of the matrix ${\tilde{L}}_{J}$ and the dark gray elements are the ones in $\tilde{L} (I_{i}, I_{i})$ . We use a PETSc routine to get the submatrices $\tilde{L} (I_{i}, I_{i})$ as a small dense matrix and use our own dense triangular solver to compute c _i (ℐ _i ), which is a small dense vector, and then use another PETSc routine to store it in the correct positions of c _i , which is a column of the sparse matrix ${\tilde{C}}_{J}$ . We observe that the nonzero columns of A _JΓ that take part in these operations, in each subdomain, are related to the extended interface, but are stored locally.

Figure 4:
Incomplete Forward Substitution.

Note that ${\tilde{L}}_{J}$ and A _JΓ (see Equation (2.5)) are in process J, as can be seen in Figure 3, which means that Step 2 is done locally without any communication.

Step 3 computes ${\tilde{B}}_{J}$ also through the incomplete forward substitution using the same idea described for Step 2 , but A _ΓJ is not stored entirely in process J. Actually, A _ΓJ has nonzero elements in the rows associated with the extended interface ${\bar{Γ}}_{J}$ , therefore the elements of A _ΓJ are distributed among process J and its neighbors. So each process has to communicate only with its own neighbors to compute ${\tilde{B}}_{J}$ . The matrix ${\tilde{B}}_{J}$ is stored locally in process J.

The communication layout of Step 3 is illustrated in an example for four subdomains in Figure 5, where the arrows show the data transfer flow. In this example, using, for instance, five-point centered finite differences, each subdomain has at most two neighbors, but in different 2-D or 3-D grids and with different discretization schemes, each subdomain can have more neighbors.

Figure 5:
Domain distribution among processes.

The Step 4 computes the Schur complement through an incomplete product as described by equation (3.9). The submatrix A _ΓΓ is formed by the subdomains interfaces and therefore is distributed among the processes. Each process computes locally its contribution $I P ({\tilde{B}}_{J}, {\tilde{C}}_{J}, k_{P r o d})$ and subtracts globally from A _ΓΓ to build up $\tilde{S}$ , which is also distributed among the processes in the same way A _ΓΓ is. Again, each subdomain communicates with their neighbors following the pattern illustrated in Figure 5, but with arrows pointers reversed, and there is a synchronization barrier to form the global matrix $\tilde{S}$ .

In Step 5 and 6, each process takes a copy of the local part of $\tilde{S}$ relative to the extended interface ${\bar{Γ}}_{J}$ as in (3.10), communicating only with neighbors, and makes an incomplete LU factorization. The use of local copies avoids communication during the parallel application of $\tilde{S}$ throughout the linear solver iterations when using the RAS⁵5. X. Cai & M. Sarkis. A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems SIAM Journal on Scientific Computing, 21(2) (1999), 792-797. version of the preconditioner.

In Step 7 the matrix $R_{0} A R_{0}^{T}$ is initially formed as a parallel matrix, distributed among the processes. Local copies of this matrix is made in all processes and then a LU factorization is done redundantly. The order of this matrix is P, which is in general quite small.

4.2 Preconditioner Application

Algorithm 2 describes the application of the preconditioner M to the vector r, i.e., the computation of z=Mr. We define two additional restriction operators, $R_{Ω_{J}} : Ω \to Ω_{J}^{I n t}$ and R _Γ:Ω→ Γ. These restriction operators are needed to operate on the correct entries of the vector, it can be seen as a subset of indices in the implementation.

Algorithm 2:
Preconditioner Application

Step 1 of Algorithm 2 solves a triangular linear system in each process involving the ${\tilde{L}}_{J}$ factor and the local interior portion of the residual, obtaining z _J . This process is done locally and no communication is necessary because the residual r is distributed among the processes following the same parallel layout of the matrix.

Step 2 applies ${\tilde{B}}_{J}$ multiplying z _J and subtracting from r _Γ obtaining z _Γ and the communication between the neighbor subdomains is necessary only in r _Γ and z _Γ because ${\tilde{B}}_{J}$ is stored locally in process J.

The Schur complement application is done in Step 3. Because each process has a local copy of its parts of $\tilde{S}$ factored into $L_{\tilde{S}}$ e $U_{\tilde{S}}$ , only the communication in z _Γ between the neighbor subdomains is necessary. As described above, T _J is a diagonal operator, so we can compute it as a vector, which is applied through an element wise product. The result is then summed globally in z _Γ. At this point there is a synchronization barrier among all processes.

In Step 4 we apply the ${\tilde{U}}_{J}$ factor to $z_{J} - {\tilde{C}}_{J} z_{Γ}$ . The matrix-vector product ${\tilde{C}}_{J} z_{Γ}$ requires communication among neighbors in z _Γ and then the triangular solver is done locally with no communication.

The vectors z _J and the vector z _Γ are gathered in z _F in Step 5, where z _F is the residual preconditioned by the fine component of the preconditioner.

Step 6 applies the coarse component, completing the computation of the preconditioned residual. Triangular solvers involved are local because there are local copies of L _C and U _C in each process, so no communication is necessary.

5 COMPARISON TESTS

We use two metrics to evaluate the preconditioners: the number of iterations and the solution time. In this section we describe the platform where the tests were run, present the results of the proposed preconditioner, and compare with a native PETSc preconditioner.

We consider five matrices from reservoir simulation problems. They are linear systems arising from applying Newton’s method to solve the nonlinear equations discretizing the system of PDEs that model multiphase flow in porous media. Matrices F8, U26, U31 and Psalt were generated from a Petrobras reservoir simulation tool. The Psalt problem is a synthetic presalt reservoir model. The fifth matrix is the standard SPE10 model¹⁰10. M. A. Christie & M. J. Blunt. Tenth SPE comparative solution project: A comparison of upscaling techniques in SPE Reservoir Simulation Symposium, (2001). with over a million active cells. All the problems are black oil models using fully implicit method and there are four degrees of freedom per grid cell.

Table 1 shows the problems’ size considering the total number of cells and the size of their respective matrices.

Thumbnail

Table 1:
Test case problems.

For both preconditioners the problem was solved using native PETSc right preconditioned GMRES method²⁷27. Y. Saad & M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems SIAM Journal on scientific and statistical computing, 7(3) (1986), 856-869., with relative residual tolerance of 1e-4, maximum number of iterations of 1000, and restarting at every 30^th iteration. The mesh was partitioned using PT-Scotch⁹9. C. Chevalier & F. Pellegrini. PT-Scotch: A tool for efficient parallel graph ordering Parallel computing, 34(6) (2008), 318-331.. ISchur has fill-in level 1 in the interior of the subdomains and zero in the interfaces, namely k _Int = 1, k _Bord = k _Prod = k _Γ = 0, as this specific configuration presented a better behavior in preliminary Matlab tests.

We compare the proposed two-level preconditioner with PETSc’s native block Jacobi preconditioner (see³3. S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B.F. Smith, S. Zampini, H. Zhang, & H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory, (2016).) combined (as in equation (3.1)) with the same coarse component presented in this paper. We refer to this combination simply as block Jacobi. We note that block-Jacobi/ILU is the default preconditioner in PETSc and is a common choice of preconditioner in reservoir simulation, which makes it an appropriate benchmark.

The first set of tests was run on an Intel Xeon(R) 64-bit CPU E5-2650 v2 @ 2.60GHz with 16 cores (hyper-threading disabled), 20MB LLC, 64GB RAM (ECC enabled). The operating system is a Debian GNU/Linux distribution running kernel version 3.2.65. The compiler is Intel Parallel Studio XE 2015.3.187 Cluster Edition.

The iSchur preconditioner, both in setup and application phases, requires more computation and communication than block Jacobi, so it has to present a pronounced reduction in number of iterations compared with block Jacobi in order to be competitive. Table 2 shows the number of iterations and the total time, in seconds, for both preconditioners, for all the tested problems, ranging from one to sixteen processes. For each test, the best results are highlighted in gray. We can observe that the iSchur preconditioner does reduce the number of iterations compared to block Jacobi for all problems in any number of processes. For one process (sequential execution), both preconditioners are equal, as there are no interface points, so we show the results just for the sake of measuring speed up and scalability. In average, iSchur does 80% as many iterations as block Jacobi. In this set of experiments, iSchur was more robust than block Jacobi, as the latter failed to converge in two experiments.

Thumbnail

Table 2:
Number of iterations and the total time for both preconditioners from 1 to 16 processes. (NC) stands for “not converged”.

We also observe that, for some of the matrices, the number of iterations increases with the number of processes. This is to be expected, since block Jacobi neglects larger portions of the matrix as the size of the subdomains decreases. A correlated fact occurs with iSchur. In this case as we are using different levels of fill in the interface related parts and in the interior parts, more information is neglected with more subdomains, as the number of interface points increases.

Table 2 also shows the total time of the solution using both preconditioners. We can see that despite the reduction in the number of iterations the solver time of iSchur is smaller than block Jacobi only in a few cases. One of the main reasons is that the iSchur setup time is greater than block Jacobi’s, especially when running on few processes. Although iSchur’s setup is intrinsically more expensive than block Jacobi’s, we believe that this part of our code can still be further optimized to reduce this gap.

In reservoir simulation, the sparsity pattern of the Jacobian matrix only changes when the operating conditions of the wells change, and therefore it remains the same over the course of many nonlinear iterations. This is an opportunity to spare some computational effort in the setup phase, since in this case the symbolic factorization does not need to be redone. We can see in Table 3 the time comparison when the symbolic factorization is not considered; in this scenario iSchur is faster in most cases. Furthermore, block Jacobi is a highly optimized, mature code (the default PETSc preconditioner), while iSchur could possibly be further optimized.

Thumbnail

Table 3:
Time of iterative solver without the symbolic factorization.

In order to examine the behavior of the preconditioners as the number of subdomains increases, we ran tests up to 512 processes on a cluster consisting of nodes with 16GB RAM and two Intel Xeon E5440 Quad-core processors each, totaling 8 CPU cores per node. The results are depicted in Figure 6. We see that iSchur consistently outperforms block Jacobi in terms of number of iterations. Nonetheless, the behaviour of both preconditioners, with respect to scalability, cannot be predicted from these experiments. Tests with more MPI processes, in a larger cluster, are still required.

Figure 6:
Comparison between block Jacobi and iSchur up to 512 processes.

6 CONCLUSIONS AND REMARKS

The presented preconditioner successfully reduced the number of iterations for the target reservoir problems compared to block Jacobi for the full range of partitions tested.

The scalability of both preconditioners is an on-going research as tests in a larger cluster are still necessary, but these initial results indicate a little advantage to iSchur.

For the current implementation the total solution time, setup and iterative solver, of the iSchur is smaller in only a few cases, where one important factor is the inefficiency in the setup step. The block Jacobi in PETSc is highly optimized while our implementation has room for improvement.

Removing the symbolic factorization time, considering the case when it is done once and used multiple times, which is usual in reservoir simulation, the solution using the iSchur preconditioner is faster for most tested cases. It shows that the presented preconditioner can be a better alternative than block Jacobi for the finer component when using two-level preconditioners to solve large-scale reservoir simulation problems. Furthermore, we deem that the application part of the code can still be optimized.

REFERENCES

¹
T. M. Al-Shaalan, H. Klie, A.H. Dogru, & M. F. Wheeler. Studies of robust two stage preconditioners for the solution of fullyimplicit multiphase flow problems. In SPE Reservoir Simulation Symposium, 2-4 February, The Woodlands, Texas,number SPE-118722 in MS, (2009).
²
D. A. Augusto, L. M. Carvalho, P. Goldfeld, I. C. L. Nievinski, J.R.P. Rodrigues, & M. Souza. An algebraic ILU(k) based two-level domain decomposition preconditioner. In Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 3 (2015), 010093-1-010093-7.
³
S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B.F. Smith, S. Zampini, H. Zhang, & H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory, (2016).
⁴
M. Bollhöfer & V. Mehrmann. Algebraic multilevel methods and sparse approximate inverses. SIAM Journal on Matrix Analysis and Applications, 24(1) (2002), 191-218.
⁵
X. Cai & M. Sarkis. A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems SIAM Journal on Scientific Computing, 21(2) (1999), 792-797.
⁶
H. Cao, H. Tchelepi, J. Wallis & H. Yardumian. Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In SPE Annual Technical Conference and Exhibition, 9-12 October 2005, Dallas, Texas. Society of Petroleum Engineers, (2005).
⁷
L. M. Carvalho, L. Giraud & P. L. Tallec. Algebraic two-level preconditioners for the Schur complement method, SIAM Journal on Scientific Computing, 22(6) (2001), 1987-2005.
⁸
T. F. Chan, J. Xu & L. Zikatanov. An agglomeration multigrid method for unstructured grids. Contemporary Mathematics, 218 (1998), 67-81.
⁹
C. Chevalier & F. Pellegrini. PT-Scotch: A tool for efficient parallel graph ordering Parallel computing, 34(6) (2008), 318-331.
¹⁰
M. A. Christie & M. J. Blunt. Tenth SPE comparative solution project: A comparison of upscaling techniques in SPE Reservoir Simulation Symposium, (2001).
¹¹
D. A. Collins, J.E. Grabenstetter & P. H. Sammon. A Shared-Memory Parallel Black-Oil Simulator with a Parallel ILU Linear Solver. SPE-79713-MS. In SPE Reservoir Simulation Symposium, Houston, 2003. Society of Petroleum Engineers.
¹²
L. Formaggia & M. Sala. Parallel Computational Fluids Dynamics. Practice and Theory., chapter Algebraic coarse grid operators for domain decomposition based preconditioners. Elsevier, (2002), 119-126.
¹³
S. Gries, K. Stüben, G. L. Brown, D. Chen & D. A. Collinns. Preconditioning for efficiently applying algebraic multigrid in fully implicit reservoir simulations. SPE Journal, 19(04): 726-736, August 2014. SPE-163608-PA.
¹⁴
H. Jackson, M. Taroni & D. Ponting. A two-level variant of additive Schwarz preconditioning for use in reservoir simulation. arXiv preprint arXiv: 1401.7227, (2014).
¹⁵
P. Jenny, S. Lee & H. Tchelepi. Multi-scale finite-volume method for elliptic problems in subsurface flow simulation. Journal of Computational Physics, 187(1) (2003), 47- 67.
¹⁶
P. Jenny, S. Lee & H. Tchelepi. Adaptive fully implicit multi-scale finite-volume method for multi-phase flow and transport in heterogeneous porous media. Journal of Computational Physics, 217(2) (2006), 627-641.
¹⁷
P. Jenny, S. H. Lee & H. A. Tchelepi. Adaptive multiscale finite-volume method for multiphase flow and transport in porous media. Multiscale Modeling & Simulation, 3(1) (2005), 50-64.
¹⁸
A. Manea, J. Sewall & H. Tchelepi. Parallel multiscale linear solver for reservoir simulation. Lecture Notes - slides - 4th SESAAI Annual Meeting Nov 5, 2013, (2013).
¹⁹
A. Muresan & Y. Notay. Analysis of aggregation-based multigrid. SIAM Journal on Scientific Computing, 30(2) (2008), 1082-1103.
²⁰
R. Nabben & C. Vuik. A comparison of deflation and coarse grid correction applied to porous media flow. SIAM J. Numer. Anal., 42 (2004), 1631-1647.
²¹
F. Nataf, H. Xiang & V. Dolean. A two level domain decomposition preconditioner based on local Dirichlet-to-Neumann maps. Comptes Rendus Mathematique, 348(21-22) (2010), 1163-1167.
²²
Y. Notay. Algebraic multigrid and algebraic multilevel methods: a theoretical comparison. Numer. Linear Algebra Appl, 12(5-6) (2005), 419-451.
²³
Y. Notay. Algebraic analysis of two-grid methods: The nonsymmetric case. Numerical Linear Algebra with Applications, 17(1) (2010), 73-96.
²⁴
D. W. Peaceman Fundamentals of Numerical Reservoir Simulation. Elsevier, (1977).
²⁵
A. Quarteroni & A. Valli. Applied and Industrial Mathematics: Venice - 1, 1989, chapter Theory and Application of Steklov-Poincaré Operators For Boundary-Value Problems. Kluwer, (1991), 179-203.
²⁶
Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2nd edition, (2003).
²⁷
Y. Saad & M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems SIAM Journal on scientific and statistical computing, 7(3) (1986), 856-869.
²⁸
Y. Saad & J. Zhang. BILUM: Block versions of multielimination and multilevel ILU preconditioner for general sparse linear systems. SIAM Journal on Scientific Computing, 20(6) (1999), 2103-2121.
²⁹
B. Smith, P. Bjørstad & W. Gropp. Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, New York, 1st edition, (1996).
³⁰
J. Tang, R. Nabben, C. Vuik & Y. Erlangga. Theoretical and numerical comparison of various projection methods derived from deflation, domain decomposition and multigrid methods. Reports of the Department of Applied Mathematical Analysis 07-04, Delft University of Technology, January 2007.
³¹
U. Trottenberg, C. Oosterlee & A. Schuller. Multigrid. Academic Press, Inc. Orlando, USA, (2001).
³²
C. Vuik & J. Frank. Coarse grid acceleration of a parallel block preconditioner. Future Generation Computer Systems, JUN, 17(8) (2001), 933-940.
³³
J. Wallis, R. Kendall & T. Little. Constrained residual acceleration of conjugate residual methods. In SPE Reservoir Simulation Symposium, number SPE 13563 in 8 th Symposium on Reservoir Simulation, Dallas, 1985. Society of Petroleum Engineers.

†
This article is based on work presented at CNMAC2016.

1
Had the condition K>J been dropped from definition (2.3),  would still be a separator set. But it would be unnecessarily large, which would yield a more expensive preconditioner.

2
If A is structurally symmetric, then A ^T and A have the same nonzero structure but are not necessarily equal.

Publication Dates

Publication in this collection
Jan-Apr 2018

History

Received
28 Nov 2016
Accepted
26 Sept 2017

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] ¹
T. M. Al-Shaalan, H. Klie, A.H. Dogru, & M. F. Wheeler. Studies of robust two stage preconditioners for the solution of fullyimplicit multiphase flow problems. In SPE Reservoir Simulation Symposium, 2-4 February, The Woodlands, Texas,number SPE-118722 in MS, (2009).

[2] ²
D. A. Augusto, L. M. Carvalho, P. Goldfeld, I. C. L. Nievinski, J.R.P. Rodrigues, & M. Souza. An algebraic ILU(k) based two-level domain decomposition preconditioner. In Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 3 (2015), 010093-1-010093-7.

[3] ³
S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B.F. Smith, S. Zampini, H. Zhang, & H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory, (2016).

[4] ⁴
M. Bollhöfer & V. Mehrmann. Algebraic multilevel methods and sparse approximate inverses. SIAM Journal on Matrix Analysis and Applications, 24(1) (2002), 191-218.

[5] ⁵
X. Cai & M. Sarkis. A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems SIAM Journal on Scientific Computing, 21(2) (1999), 792-797.

[6] ⁶
H. Cao, H. Tchelepi, J. Wallis & H. Yardumian. Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In SPE Annual Technical Conference and Exhibition, 9-12 October 2005, Dallas, Texas. Society of Petroleum Engineers, (2005).

[7] ⁷
L. M. Carvalho, L. Giraud & P. L. Tallec. Algebraic two-level preconditioners for the Schur complement method, SIAM Journal on Scientific Computing, 22(6) (2001), 1987-2005.

[8] ⁸
T. F. Chan, J. Xu & L. Zikatanov. An agglomeration multigrid method for unstructured grids. Contemporary Mathematics, 218 (1998), 67-81.

[9] ⁹
C. Chevalier & F. Pellegrini. PT-Scotch: A tool for efficient parallel graph ordering Parallel computing, 34(6) (2008), 318-331.

[10] ¹⁰
M. A. Christie & M. J. Blunt. Tenth SPE comparative solution project: A comparison of upscaling techniques in SPE Reservoir Simulation Symposium, (2001).

[11] ¹¹
D. A. Collins, J.E. Grabenstetter & P. H. Sammon. A Shared-Memory Parallel Black-Oil Simulator with a Parallel ILU Linear Solver. SPE-79713-MS. In SPE Reservoir Simulation Symposium, Houston, 2003. Society of Petroleum Engineers.

[12] ¹²
L. Formaggia & M. Sala. Parallel Computational Fluids Dynamics. Practice and Theory., chapter Algebraic coarse grid operators for domain decomposition based preconditioners. Elsevier, (2002), 119-126.

[13] ¹³
S. Gries, K. Stüben, G. L. Brown, D. Chen & D. A. Collinns. Preconditioning for efficiently applying algebraic multigrid in fully implicit reservoir simulations. SPE Journal, 19(04): 726-736, August 2014. SPE-163608-PA.

[14] ¹⁴
H. Jackson, M. Taroni & D. Ponting. A two-level variant of additive Schwarz preconditioning for use in reservoir simulation. arXiv preprint arXiv: 1401.7227, (2014).

[15] ¹⁵
P. Jenny, S. Lee & H. Tchelepi. Multi-scale finite-volume method for elliptic problems in subsurface flow simulation. Journal of Computational Physics, 187(1) (2003), 47- 67.

[16] ¹⁶
P. Jenny, S. Lee & H. Tchelepi. Adaptive fully implicit multi-scale finite-volume method for multi-phase flow and transport in heterogeneous porous media. Journal of Computational Physics, 217(2) (2006), 627-641.

[17] ¹⁷
P. Jenny, S. H. Lee & H. A. Tchelepi. Adaptive multiscale finite-volume method for multiphase flow and transport in porous media. Multiscale Modeling & Simulation, 3(1) (2005), 50-64.

[18] ¹⁸
A. Manea, J. Sewall & H. Tchelepi. Parallel multiscale linear solver for reservoir simulation. Lecture Notes - slides - 4th SESAAI Annual Meeting Nov 5, 2013, (2013).

[19] ¹⁹
A. Muresan & Y. Notay. Analysis of aggregation-based multigrid. SIAM Journal on Scientific Computing, 30(2) (2008), 1082-1103.

[20] ²⁰
R. Nabben & C. Vuik. A comparison of deflation and coarse grid correction applied to porous media flow. SIAM J. Numer. Anal., 42 (2004), 1631-1647.

[21] ²¹
F. Nataf, H. Xiang & V. Dolean. A two level domain decomposition preconditioner based on local Dirichlet-to-Neumann maps. Comptes Rendus Mathematique, 348(21-22) (2010), 1163-1167.

[22] ²²
Y. Notay. Algebraic multigrid and algebraic multilevel methods: a theoretical comparison. Numer. Linear Algebra Appl, 12(5-6) (2005), 419-451.

[23] ²³
Y. Notay. Algebraic analysis of two-grid methods: The nonsymmetric case. Numerical Linear Algebra with Applications, 17(1) (2010), 73-96.

[24] ²⁴
D. W. Peaceman Fundamentals of Numerical Reservoir Simulation. Elsevier, (1977).

[25] ²⁵
A. Quarteroni & A. Valli. Applied and Industrial Mathematics: Venice - 1, 1989, chapter Theory and Application of Steklov-Poincaré Operators For Boundary-Value Problems. Kluwer, (1991), 179-203.

[26] ²⁶
Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2nd edition, (2003).

[27] ²⁷
Y. Saad & M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems SIAM Journal on scientific and statistical computing, 7(3) (1986), 856-869.

[28] ²⁸
Y. Saad & J. Zhang. BILUM: Block versions of multielimination and multilevel ILU preconditioner for general sparse linear systems. SIAM Journal on Scientific Computing, 20(6) (1999), 2103-2121.

[29] ²⁹
B. Smith, P. Bjørstad & W. Gropp. Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, New York, 1st edition, (1996).

[30] ³⁰
J. Tang, R. Nabben, C. Vuik & Y. Erlangga. Theoretical and numerical comparison of various projection methods derived from deflation, domain decomposition and multigrid methods. Reports of the Department of Applied Mathematical Analysis 07-04, Delft University of Technology, January 2007.

[31] ³¹
U. Trottenberg, C. Oosterlee & A. Schuller. Multigrid. Academic Press, Inc. Orlando, USA, (2001).

[32] ³²
C. Vuik & J. Frank. Coarse grid acceleration of a parallel block preconditioner. Future Generation Computer Systems, JUN, 17(8) (2001), 933-940.

[33] ³³
J. Wallis, R. Kendall & T. Little. Constrained residual acceleration of conjugate residual methods. In SPE Reservoir Simulation Symposium, number SPE 13563 in 8 th Symposium on Reservoir Simulation, Dallas, 1985. Society of Petroleum Engineers.

Matrix	Problem Size	Matrix Size
F8	182558	730232
U26	408865	1635460
U31	617459	2469836
Psalt	765620	3062480
SPE10	1094421	4377684

Mat	Prec	Number of Iterations Processes					Total Time (s) Processes
Mat	Prec	1	2	4	8	16	1	2	4	8	16
F8	iS	104	0.9113	0.9106	0.992	0.9102	10.8	7.9	4.1	2.1	1.8
F8	bJ	104	119	109	125	135	10.8	0.96.1	0.93.0	0.91.9	0.91.7
U26	iS	92	0.9131	0.9199	0.9218	0.9303	22.0	0.919.8	0.914.5	0.99.1	9.9
U26	bJ	92	252	NC	404	349	22.0	28.2	NC	13.3	0.99.7
U31	iS	69	0.9110	0.9449	0.9517	0.9864	28.0	28.8	0.948.7	0.932.1	0.943.1
U31	bJ	69	124	593	639	NC	28.0	0.924.1	56.7	35.4	NC
Psalt	iS	34	0.968	0.957	0.972	0.954	17.8	24.3	11.9	8.2	5.6
Psalt	bJ	34	87	76	117	76	17.8	0.919.7	0.99.2	0.98.0	0.95.0
SPE10	iS	116	0.9135	0.9148	0.9148	0.9176	71.5	53.9	30.9	18.0	17.5
SPE10	bJ	116	142	149	164	198	71.5	0.943.6	0.924.1	0.915.4	0.916.1

Iterative Solver Time (s)
Mat	Prec	Processes
Mat	Prec	1	2	4	8	16
F8	iS	10.5	6.3	3.3	0.91.7	0.91.6
F8	bJ	10.5	0.96.1	0.93.0	1.9	1.7
U26	iS	21.7	0.916.2	0.912.7	0.98.2	0.99.4
U26	bJ	21.7	28.1	NC	13.3	9.7
U31	iS	27.5	0.922.8	0.945.6	0.930.5	0.942.2
U31	bJ	27.5	23.9	56.7	35.3	NC
Psalt	iS	17.3	0.917.5	0.98.4	0.96.4	0.94.6
Psalt	bJ	17.3	19.6	9.2	8.0	5.0
SPE10	iS	70.8	44.2	26.0	15.5	0.916.1
SPE10	bJ	70.8	0.943.4	0.924.0	0.915.3	16.1

Brasil

Brasil

Parallel Implementation of a Two-level Algebraic ILU(k)-based Domain Decomposition Preconditioner ^† † This article is based on work presented at CNMAC2016.

ABSTRACT

RESUMO

1 INTRODUCTION

2 NOTATION

3 DESCRIPTION OF THE TWO-LEVEL PRECONDITIONER

3.1 ILU(k) based Domain Decomposition

3.2 Coarse Space Correction

4 PARALLEL IMPLEMENTATION

4.1 Preconditioner Setup

4.2 Preconditioner Application

5 COMPARISON TESTS

6 CONCLUSIONS AND REMARKS

REFERENCES

Publication Dates

History

Brasil

Brasil

Parallel Implementation of a Two-level Algebraic ILU(k)-based Domain Decomposition Preconditioner † † This article is based on work presented at CNMAC2016.

ABSTRACT

RESUMO

1 INTRODUCTION

2 NOTATION

3 DESCRIPTION OF THE TWO-LEVEL PRECONDITIONER

3.1 ILU(k) based Domain Decomposition

3.2 Coarse Space Correction

4 PARALLEL IMPLEMENTATION

4.1 Preconditioner Setup

4.2 Preconditioner Application

5 COMPARISON TESTS

6 CONCLUSIONS AND REMARKS

REFERENCES

Publication Dates

History

Parallel Implementation of a Two-level Algebraic ILU(k)-based Domain Decomposition Preconditioner ^† † This article is based on work presented at CNMAC2016.