A useful statistical measure is the diagonal or the trace of the inverse of
a matrix. This has myriad applications from quantum mechanics and Lattice QCD
to uncertainty quantification and data mining.
Most commonly the matrix
or its Hermitian part
would be Hermitian
positive definite and very large and sparse (large dimension
). We are
interested in cases where a direct factorization of the matrix is not possible.
For such large matrices, the current state-of-the-practice is to use a
Monte Carlo method known as Hutchinson's which is based on the fact that
the trace can be obtained as the expected value of
.
Therefore, a simple averaging algorithm can be implemented where random vectors
are chosen and linear systems
are solved with an iterative method.
The method converges as
, where the variance of the
estimator,
, is known to be minimized by choosing
with random
Rademacher entries
(in the real case).
This variance is given by
The two computational bottlenecks are (a) for ill conditioned systems iterative
methods may take many steps and (b) thousands of linear systems are needed just
to get two digits of accuracy in the trace.
In earlier work, we showed how approximate eigenvectors and eigenvalues can
be used to deflate, and thus remove the ill conditioning in linear systems
with multiple right hand sides.
We also developed Hierarchical Probing, a method to systematically produce
the vectors
in Hutchinsons's method so that variance is reduced significantly.
In this talk we focus on the effect of deflation on reducing the variance,
and how it can complement Hierarchical Probing to achieve substantial further
improvements.
Assume for simplicity that we want to find tr(
) (not tr(
)).
Consider its singular value decomposition,
where
corresponds to the largest
singular triplets of
.
Clearly, the trace of
is trivial to compute direclty.
Thus, using the appropriate projector, we can apply Hutchinson's method
on the deflated
, hoping this has lower variance.
We prove that
where
To bypass the complications caused by
,
we assume that
and
are standard random unitary matrices, i.e.,
distributed with the Haar probability measure.
For large size matrices this is a reasonable assumption, and it is also
theoretically supported for matrices in Lattice QCD.
Based on an analysis of the moments of the elements of these matrices, we have
derived an expression for the expected variance of the deflated estimator.
Define the mean and the variance of the
singular values of
,
, and
, respectively.
Then, for non-Hermitian matrices it holds
![]() |
![]() |
The second contribution of the paper is that we observed a synergistic action between deflation and hierarchical probing. Specifically, we analyzed the geometric effect of the two methods and observed that while hierarchical probing removes, by design, the error in neighboring lattice nodes at increasing distances, deflation is not good at that but excels in removing error in long lattice distances. In our experiments on a very large Lattice QCD problem, while deflation alone gave a small improvement, Hierarchical Probing reduced variance by an order of magnitude, and combining the two methods gave us an additional order of magnitude; an overall speedup of more than 200.