janna_carlo___

Carlo Janna
Supernodes in Factored Sparse Approximate Inverse preconditioning

Dept ICEA - University of Padova
via Trieste 63
35121 Padova
Italy
carlo.janna@unipd.it
Massimiliano Ferronato
Giuseppe Gambolati

In the solution of sparse linear systems, the flop count required by direct methods is generally much higher than that of cost-effective iterative methods, especially in three dimensional problems. However, direct methods are still competitive, as they can reach much higher flop rates. This is possible by the use of supernodes, that is clusters of unknowns that are processed together with cache-efficient dense linear algebra kernels. On the contrary, iterative methods can rarely give the same flops performances as direct methods because they require an indirect memory indexing preventing intensive use of the processor cache. Supernodes have been successfully exploited in the field of incomplete LU preconditioning though with lower flops than in direct methods.

In this work we extend the concept of supernode to Factored Sparse Approximate Inverse (FSAI) preconditioning [2] to reduce the set-up cost and improve the overall solver convergence. The FSAI preconditioner $M^{-1}$ is defined as:

$\displaystyle M^{-1} = G^T G$

(1)

where

is an approximation of the inverse of the exact lower Cholesky factor

of the system matrix

. The supernodes for FSAI preconditioning are developed for both a static and a dynamic pattern generation of

Static FSAI. Suppose that the FSAI non-zero pattern $\mathcal{S}$ is given and denote by $\mathcal{P}_i$ the set of column indices with cardinality $m_i=\left\vert\mathcal{P}_i\right\vert$ belonging to the -th row of $\mathcal{S}$ . Denoting by $A[\mathcal{P}_i,\mathcal{P}_i]$ the submatrix of constructed by using the indices in $\mathcal{P}_i$ and by $\widetilde{\mbox{\boldmath $ g$}}_i$ the dense vector collecting the non-zero entries of $\widetilde{G}$ , the following relation holds:

$\displaystyle A[\mathcal{P}_i,\mathcal{P}_i] \widetilde{\mbox{\boldmath$ g$}}_i = \mbox{\boldmath$ e$}_{m_i} \qquad \qquad \qquad i = 1,\dots,n$

(2)

i.e., the coefficients of any row of $\widetilde{G}$ are obtained by solving a dense linear system where $_{m_i} = [0,0,\dots,1]^T$ is the -th vector of the canonical basis of $\mathbb{R}^{m_i}$ . If two rows, let us say and , have a similar pattern, i.e., $\mathcal{P}_{i_1} \simeq \mathcal{P}_{i_2}$ , solving one enlarged system obtained from merging $\mathcal{P}_{i_1}$ and $\mathcal{P}_{i_2}$ into $\mathcal{P}_{\widetilde{i}}$ can be less expensive than solving two smaller systems. This pair of rows form a supernode that can be compared to other rows and, if convenient, merged to other sets of indices, thus enlarging the supernode. The collapse of a row into a supernode is decided on the basis of the computational cost for solving (2). Defining a cost model for solving dense systems of size with right-hand sides, a supernode $\widetilde{i}$ merging and is formed if:

$\displaystyle c(\widetilde{m},2) < c(m_1,1) + c(m_2,1)$

(3)

Similarly, a node is incorporated into the supernode $\widetilde{i}$ that has already merged rows if:

$\displaystyle c(\widetilde{m}+h,\widetilde{l}+1) < c(\widetilde{m},\widetilde{l}) + c(m_k,1)$

(4)

where denote the cardinality of the set $\mathcal{P}_{i_k}$ , $\widetilde{m}$ the cardinality of $\mathcal{P}_{\widetilde{i}}$ and the number of mismatches, that is the number of column indices lying in $\mathcal{P}_{i_k}$ but not in $\mathcal{P}_{\widetilde{i}}$ .

Dynamic FSAI. We follow the adaptive approach proposed in [1] for Block FSAI preconditioning and specified as pure FSAI in [2]. The idea is to start from an initial guess, say , and improve dynamically the pattern $\mathcal{S}_0$ by adding the entries that most reduce the Kaporin condition number of the preconditioned matrix . A new FSAI, , is thus computed, and the procedure is repeated times until either the maximum number of steps is reached or an appropriate exit criterion is met. To lower the computational cost of the above procedure we use the concept of supernode that merges together groups of rows. With no loss of generality, we merge a set of sequential rows of $\widetilde{G}$ with indices in $\mathcal{R}= \{i_1, i_2,\dots,i_l \}$ . A supernode is defined if:

(1): every row in the cluster has the same non-zero pattern outside $\widetilde{G}[\mathcal{R},\mathcal{R}]$ ;
(2): the diagonal block is an $l \times l$ identity matrix, $\widetilde{G}[\mathcal{R},\mathcal{R}] \equiv I_l$ .

It can be proved that the Kaporin number of is minimum iff for each supernode $\widetilde{G}$ satisfy:

$\displaystyle A[\mathcal{C},\mathcal{C}] \; \widetilde{G}[\mathcal{R},\mathcal{C}]^T = -A[\mathcal{C},\mathcal{R}]$

(5)

and is then obtained by scaling $\widetilde{G}$ with a block diagonal matrix such that the diagonal entries of are unitary. In equation (5) $\mathcal{C}$ is the set of column indices outside $\widetilde{G}[\mathcal{R},\mathcal{R}]$ . With a dynamic FSAI supernodes size is user-specified, forcing groups of adjacent rows to merge. For these rows, the entries of $\widetilde{G}$ are computed by equation (5) where a new column is introduced whenever it allows for a significant decrease of the current value of the Kaporin number of . A score for each column index is computed and the supernode pattern is enlarged until either a maximum number of columns are added or an appropriate exit criterion is met.

The Supernodes performance is investigated in both static and dynamic FSAI with a set of numerical experiments. The results show that they are effective in reducing significantly the preconditioner set-up time. Experience indicates that super nodes can improve performance up to 6 times better than the native FSAI.

References:

[1] C. Janna and M. Ferronato, Adaptive pattern research for Block FSAI preconditioners, SIAM Journal on Scientific Computing, 33 (2011), 3357-3380.

[2] C. Janna, M. Ferronato, F. Sartoretto and G. Gambolati, FSAIPACK: a software package for high performance Factored Sparse Approximate Inverse preconditioning, ACM Transactions on Mathematical Software, submitted.

About this document ...

Next: About this document ...

Copper Mountain 2014-02-24