===firstname:
Gary W. 
===firstname3:

===affil6:

===lastname3:

===email:
gwhowell@ncsu.edu
===keyword_other2:
Linear Least Squares 
===lastname6:

===affil5:

===lastname4:

===lastname7:

===affil7:

===postal:
512 Farmington Woods Dr.
Cary, NC
27511
===ABSTRACT:
We investigate how to use an $LU$ factorization with the classical {\tt lsqr} routine for solving overdetermined sparse least squares problems. Usually $L$ is much better conditioned than $A$.  Thus iterating  with $L$ instead of $A$ results in faster convergence.
Numerical experiments using Matlab illustrate the good behavior of our algorithm in terms of storage and convergence.  
This paper explores a preliminary shared memory implementation using SuperLU factorization.

As with other sparse linear systems, preconditioning techniques based on incomplete factorizations can improve convergence. One method to precondition the normal equations~(\ref{eq:ne}) is to perform an incomplete Cholesky decomposition of $A^T A$  (e.g., RIF preconditioner~\cite{benzi03}).

When $A^TA$ and its Cholesky factorization are denser than $A$, it is natural to wonder if the $LU$ factorization of $A$ can be used in solving the least squares problem.  In this paper we use an $LU$ factorization of the rectangular matrix $A=\left(
\begin{array}{c}
A_1\\
A_2\\
\end{array}
\right)
$ where $L$ is unit lower trapezoidal and $U$ is upper triangular.
For the nonpivoting case, the normal equations~(\ref{eq:ne}) become
\ibegin{equation}
\label{eq:neL} 
L^TLy=c,
\end{equation}

with $c = L^T b, U x = y $, and we can apply CG iterations on~(\ref{eq:neL}).

Least squares solution using $LU$ factorization has been explored by several authors. Peters and Wilkinson and Bj\"orck and Duff give direct methods.  This work follows Bj\"orck and Yuan using conjugate gradient methods based on $LU$ factorization, an approach worth revisiting because of the recent progress in sparse $LU$ factorization. The lsqrLU algorithm presented here uses a lower trapezoidal $L$ returned from a direct solver package.  Here we use an $LU = PAQ$ factorization from shared memory SuperLU, comparing iteration with $L$ to the $U^{-1}A$ iteration suggested by Saunders.  Because several other direct solver packages  offer scalable sparse $LU$ factorizations, it appears likely that the algorithm used here can also be used to solve larger problems.  
The rate of linear convergence for CG iterations on the normal equations is 
$$K =  \frac {\kappa - 1 }{\kappa + 1},$$
where $\kappa=\sqrt{\cond(A^TA)}=\cond(A)$ and $\cond(A)$ denotes the 2-norm condition number of $A$ (ratio of largest and smallest singular values of $A$).
In our experiments, $L$ is often much better conditioned than $A$, so convergence of the CG method is relatively rapid.  Moreover, the total number of nonzeros
in $L$ and $U$ is usually less than in the sparse Cholesky factorization of $A^T A$. 

===affil3:

===title:
Solving Overdetermined Sparse Least Squares Problems by LU Preconditioning
===affil2:
U. Paris Sud and INRIA, Paris, France 
===lastname2:
Baboulin 
===firstname4:

===keyword1:
Iterative Linear Algebraic Data Mining Techniques
===workshop:
no
===lastname:
Howell 
===firstname5:

===keyword2:
APP_OTHER
===otherauths:

===affil4:

===competition:
no
===firstname7:

===firstname6:

===keyword_other1:

===lastname5:

===affilother:

===firstname2:
Marc