franceschini_an096701

Next: References

Andrea Franceschini
Multilevel Approaches for FSAI Preconditioning

Department of Civil and Environmental Engineering
University of Padova
via Trieste 63
35121 Padova (Italy)
andrea.franceschini@dicea.unipd.it
Victor, A. Magri
Carlo Janna
Massimiliano Ferronato

The solution of a large size sparse linear system of equations

$\displaystyle \mathbf{A} x = b,$

(1)

where $\mathbf{A}$ is a SPD matrix, requires the use of iterative methods based on Krylov subspaces. It is well known, however, that the use of these methods alone generally provide poor convergence. This can be remedied by using their preconditioned form, which requires the definition of a preconditioner, i.e., a linear transformation that approximates the action of $\mathbf{A}^{-1}$ .

In this paper, two new preconditioners built upon the dynamic FSAI algorithm [Janna et al., 2015] are considered: the BTFSAI - Block Tridiagonal Factorized Sparse Approximate Inverse - and the DDFSAI - Domain Decomposition Factorized Sparse Approximate Inverse. The first one is based on the successive application of the FSAI algorithm on the matrix reordered in a block tridiagonal structure and the second one is based on the combination of a two level domain decomposition approach and the FSAI preconditioner, which is used for computing the approximate inverse of the inner submatrices and the Schur complement.

In the Block Tridiagonal preconditioner - BTFSAI - the matrix $\mathbf{A}$ is first reordered by the reverse Cuthill-McKee algorithm, with the aim to reduce its bandwidth, and then it is divided into a block tridiagonal structure according to a given number of blocks. After that, the block LDU decomposition of this new matrix is calculated where the inverses of the diagonal blocks are approximated explicitly by the adaptive FSAI algorithm.

With this approach, the global matrix is subdivided into small blocks, that should be approximated with less effort. Though this method is intrinsically sequential, parallelism can be exploited for each block by the use of the FSAI algorithm. Calling $\mathbf{A}_i$ the SPD diagonal submatrix and $\mathbf{B}_i$ the upper diagonal submatrix of the block, the block LDU decomposition reads

$\displaystyle \mathbf{A} = \begin{bmatrix}\mathbf{I}_1 & & & \ \mathbf{B}^T_1 ... ...-1}_2 \mathbf{B}_2 & \ & & \ddots & \ddots \ & & & \mathbf{I}_n \end{bmatrix}$

(2)

where

$\displaystyle \mathbf{S}_i = \mathbf{A}_i - \mathbf{B}^T_{i-1} \mathbf{S}^{-1}_{i-1} \mathbf{B}_{i-1}, \qquad \qquad i = 2 \dots n,$

(3)

is the

block Schur complement and $\mathbf{S}_1 = \mathbf{A}_1$ , by definition. Let $\mathbf{S}^{-1}_i = \mathbf{F}^T_i \mathbf{F}_i$ be the approximate inverse of the Schur complement computed by the adaptive FSAI algorithm and let the matrix $\mathbf{H}_i$ be equal to the product $\mathbf{F}_i \mathbf{B}_i$ , it follows that the Schur complement recursively as $\mathbf{S}_i = \mathbf{A}_i - \mathbf{H}^T_{i-1} \mathbf{H}_{i-1}$ . Using these relations, the approximate inverse of the matrix $\mathbf{A}$ reads

$\displaystyle \mathbf{A}^{-1} \approx \mathbf{M}^{-1} = \begin{bmatrix}\mathbf{... ...athbf{F}_n \mathbf{H}^T_{n-1} \mathbf{F}_{n-1} & \mathbf{F}_n \ \end{bmatrix}.$

(4)

The main drawback of this approach is the need of computing

Schur complements along with their FSAI approximations. While theoretically these matrices should be SPD, the use of approximate inverses may cause the loss of this property. Indeed, for some matrices, we can have indefinite Schur complements if the approximate inverses are not accurate enough. In this case, the construction of the preconditioner should be restarted allowing for a larger number of nonzero entries per row.

Roughly speaking, domain decomposition refers to a number of techniques which solve a problem defined on a large domain $\Omega$ via the solution of smaller problems defined over subdomains $\Omega_k, k = 1,\ldots,n$ . Each problem can be solved independently and the solutions can be gathered to build the outcome of the original problem.

In the domain decomposition FSAI preconditioner - DDFSAI - the graph of the original matrix is reordered according to the k-way partition algorithm implemented in the METIS software library. With this method, the number of edges connecting a subdomain to the others is minimized, thus reducing the pieces of information that need to be communicated among different blocks. At the same time, the size of subdomains is kept to be approximately of the same order in favour of providing a good load balance among different threads, which are going to work in one or possibly more blocks. After this, each independent block of the matrix is reordered according to the reverse Cuthill-McKee algorithm in order to reduce its bandwidth.

The algorithm for constructing the DDFSAI preconditioner is almost the same as BTFSAI restricted to two blocks, with the only difference consisting in the initial ordering of the matrix, which in the current case uses the METIS library besides of the reverse Cuthill-McKee ordering.

The results show that the BTFSAI technique is very promising in terms of decreasing the total number of iterations and time for achieving the solution. Further, with the DDFSAI technique, the computational pain is smaller, but stability and scalability are generally better. We remark that these strategies are under development and further computational improvements are being addressed, for instance a hybrid MPI/OpenMP implementation.

Next: References

root 2016-02-22