===firstname: Gary W. ===firstname3: ===affil6: ===lastname3: ===email: gwhowell@ncsu.edu ===keyword_other2: Linear Least Squares ===lastname6: ===affil5: ===lastname4: ===lastname7: ===affil7: ===postal: 512 Farmington Woods Dr. Cary, NC 27511 ===ABSTRACT: We investigate how to use an $LU$ factorization with the classical {\tt lsqr} routine for solving overdetermined sparse least squares problems. Usually $L$ is much better conditioned than $A$. Thus iterating with $L$ instead of $A$ results in faster convergence. Numerical experiments using Matlab illustrate the good behavior of our algorithm in terms of storage and convergence. This paper explores a preliminary shared memory implementation using SuperLU factorization. As with other sparse linear systems, preconditioning techniques based on incomplete factorizations can improve convergence. One method to precondition the normal equations~(\ref{eq:ne}) is to perform an incomplete Cholesky decomposition of $A^T A$ (e.g., RIF preconditioner~\cite{benzi03}). When $A^TA$ and its Cholesky factorization are denser than $A$, it is natural to wonder if the $LU$ factorization of $A$ can be used in solving the least squares problem. In this paper we use an $LU$ factorization of the rectangular matrix $A=\left( \begin{array}{c} A_1\\ A_2\\ \end{array} \right) $ where $L$ is unit lower trapezoidal and $U$ is upper triangular. For the nonpivoting case, the normal equations~(\ref{eq:ne}) become \ibegin{equation} \label{eq:neL} L^TLy=c, \end{equation} with $c = L^T b, U x = y $, and we can apply CG iterations on~(\ref{eq:neL}). Least squares solution using $LU$ factorization has been explored by several authors. Peters and Wilkinson and Bj\"orck and Duff give direct methods. This work follows Bj\"orck and Yuan using conjugate gradient methods based on $LU$ factorization, an approach worth revisiting because of the recent progress in sparse $LU$ factorization. The lsqrLU algorithm presented here uses a lower trapezoidal $L$ returned from a direct solver package. Here we use an $LU = PAQ$ factorization from shared memory SuperLU, comparing iteration with $L$ to the $U^{-1}A$ iteration suggested by Saunders. Because several other direct solver packages offer scalable sparse $LU$ factorizations, it appears likely that the algorithm used here can also be used to solve larger problems. The rate of linear convergence for CG iterations on the normal equations is $$K = \frac {\kappa - 1 }{\kappa + 1},$$ where $\kappa=\sqrt{\cond(A^TA)}=\cond(A)$ and $\cond(A)$ denotes the 2-norm condition number of $A$ (ratio of largest and smallest singular values of $A$). In our experiments, $L$ is often much better conditioned than $A$, so convergence of the CG method is relatively rapid. Moreover, the total number of nonzeros in $L$ and $U$ is usually less than in the sparse Cholesky factorization of $A^T A$. ===affil3: ===title: Solving Overdetermined Sparse Least Squares Problems by LU Preconditioning ===affil2: U. Paris Sud and INRIA, Paris, France ===lastname2: Baboulin ===firstname4: ===keyword1: Iterative Linear Algebraic Data Mining Techniques ===workshop: no ===lastname: Howell ===firstname5: ===keyword2: APP_OTHER ===otherauths: ===affil4: ===competition: no ===firstname7: ===firstname6: ===keyword_other1: ===lastname5: ===affilother: ===firstname2: Marc