We investigate how to use an factorization with the classical lsqr routine for solving overdetermined sparse least squares problems. Usually is much better conditioned than . Thus iterating with instead of results in faster convergence. Numerical experiments using Matlab illustrate the good behavior of our algorithm in terms of storage and convergence. This paper explores a preliminary shared memory implementation using SuperLU factorization.
We consider the overdetermined full rank LLS problem
with other sparse linear systems, preconditoning techniques based on incomplete factorizations can improve convergence. One method to precondition the normal equations () is to perform an incomplete Cholesky decomposition of (e.g., RIF preconditioner, Benzi, 2003).
When and its Cholesky factorization are denser than , it is natural to wonder if the factorization of can be used in solving the least squares problem. In this paper we use an factorization of the rectangular matrix where is unit lower trapezoidal and is upper triangular. For the nonpivoting case, the normal equations () become
Least squares solution using factorization has been explored by several authors. Peters and Wilkinson (1970) and Björck and Duff (1980) give direct methods. This work follows Björck and Yuan (1999) using conjugate gradient methods based on factorization, an approach worth revisiting because of the recent progress in sparse factorization. The lsqrLU (2015) algorithm presented here uses a lower trapezoidal returned from a direct solver package. Here we use an factorization from shared memory SuperLU (X. LI 2005), comparing iteration with to the iteration suggested by Saunders. Because other direct solver packages offer scalable sparse factorizations, it appears likely that the algorithm used here can also be used to solve larger problems. The rate of linear convergence for CG iterations on the normal equations is
where and denotes the 2-norm condition number of (ratio of largest and smallest singular values of ). In our experiments, is often much better conditioned than , so convergence of the CG method is relatively rapid. Moreover, the total number of nonzeros in and is usually less than in the sparse Cholesky factorization of .