A potential bottleneck when solving Boltzmann transport equations in parallel is the inversion of the streaming operator. The discretized form of this operator is a lower triangular matrix or block lower triangular matrix with small blocks. The solution of these triangular systems by direct methods is inherently sequential. Although various overloading techniques have been used to amortize the costs of these lower triangular solves or ``sweeps'', the practicality of scaling to massively parallel machines with tens of thousands of processors is unclear.
In this talk, we will present new theoretical scaling models for sweeping
algorithms and compare with experiment. In theory, these algorithms have the
potential to scale like
, where
is the spatial dimension of
the problem,
is the number of directions, and
is the number of
processors. When
is fairly large, it masks the effect of the
term,
whereby delaying the poor asymptotic scaling behavior. This delay may be
adequate in some cases to get practical performance, even up to tens of
thousands of processors. However, some popular parallel sweep algorithms may
scale worse than this best-case theoretical model. This will also be discussed
in the talk.