Rank structured matrices, such as
-matrices,
-matrices, and (Hierarchically) Semi-Separable
matrices have been used for designing fast algorithms for PDEs, integral
equations, etc. Even though these classes of methods
are ideal solver candidates for extreme-scale problems and machines
because of their low computational complexity, existing studies
focus mostly on theoretical aspects, and most methods are only implemented
on sequential machines. In the last couple of years, we have been working
on scalable HSS matrix factorization algorithms and embedding them into
the traditional sparse factorization methods, exploiting the
data-sparseness property of the intermediate dense submatrices.
The HSS-structured sparse factorization achieves nearly linear complexity
for certain PDEs. It can be used as a fast direct solver for some problems,
and more importantly, as an algebraic preconditioner for broader classes of
sparse linear systems.
With the emerging architectures with manycore nodes offering million-way parallelism, one of the key challenges facing the algorithm developers is the increasing rate of system failures. We are developing ABFT schemes for the critial HSS matrix operations, such as HSS-matrix-vector product and ULV factorization. Our fault detection and recovery mechanisms take full advantage of the hierarchical nature of the HSS structure, and can be implemented efficiently through containment domains on large-scale systems.
This is joint work with Brian Austin, Osni Marques, Artem Napov, and Eric Roman.