In many signal processing applications, the non-negative least squares problem (NNLS) of ``moderate'' size (in a few hundred to a thousand variables) arises. Efficient solutions of these problems would enable online applications, in which the estimation can be performed as data is acquired. We parallelize a version of the active-set iterative algorithm derived from the original algorithm of Lawson and Hanson (1974) on a graphics processor. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the ``passive columns'' of the original system matrix. To achieve improved performance we use parallelizable procedures to efficiently update and ``downdate'' the QR factorization of the matrix at the current iteration to account for inserted and removed columns, and efficient data structures that account for GPU memory access patterns.
The NNLS problem has roots in data-modelling where we optimize a set of
underlying parameters that is used to describe observed data. The
underlying parameters denote a set variables in a
vector
. The observed data is composed of
observations in a
vector
.
Suppose that the observed data are linear functions of the underlying
parameters in the model, then the linear functions may be expressed as a
matrix
where
describes a linear mapping from the
parameters in
to the observations in
.
In the general case where , the dense overdetermined system of
linear equations may be solved via a least squares approach by
decomposing matrix
where
is an orthogonal
matrix
and
is an upper-triangular
matrix. The resulting matrix
equation may be rearranged as
and solved via back-substitution.
Sometimes, the underlying parameters are constrained to be non-negative in order to reflect real-world prior information. When the data is corrupted by noise, the estimated parameters may not satisfy these constraints, producing answers which are not usable. In these cases, it is necessary to explicitly enforce the non-negativity constraints and so we solve for
The seminal work of Lawson and Hanson in ref. [3] provided the first
widely used method for solving this NNLS problem. This algorithm, later
referred to as the active-set method, partitions the set of parameters or
variables into the active and passive-sets. The active-set contains the
variables with value zero and those that violate the constraints in the
problem. The passive-set contains the variables that do not violate the
constraint. By iteratively updating a feasibility vector with components
from the passive-set, each iteration is reduced to an unconstrained
linear least squares sub-problem that is solvable via .
We denote the unconstrained sub-problem as the linear system
where matrix
contains the column vectors in matrix
that
correspond to variables in the passive-set. Observe that any changes
between the active and passive-sets at each iteration are generally
limited to the exchange of a single variable; usually one column vector
is added or removed from
at each iteration. We make an important
distinction that exchanged variables that have remained in the same set
throughout several iterations have a lower propensity for future
exchanges. This leads to an efficient algorithm that does not recompute
the entire
decomposition at each step but rather modifies
previous
and
matrices with regards to two cases:
Our update procedure is based on the modified Gram-Schmidt algorithm for
orthogonalizing the inserted column with respect to all the existing
columns in . The time-complexity of the update step is
. The
parallel time-complexity is
. The downdate procedure
involves a series of Given's rotations that introduces zeros to a single
row of
. The time-complexity of the downdate step is
. The
parallel time-complexity is
.
We implement our algorithm on NVIDIA's Compute Unified Device
Architecture. For a comparison, Matlab's built-in lsqnonneg routine
implements a version of the Lawson and Hanson active-sets algorithm that
solves the sub-problem via a full decomposition based on Intel's
optimized Math Kernel Library code-base. Other active-set variants in
literature include the Fast NNLS (FNNLS) algorithm in ref. [1] and the
Projective Quasi-Newton NNLS (PQN-NNLS) algorithm in ref. [2]. For
experiments results, we apply the listed algorithms to a deconvolution
problem with data obtained from terrain laser imaging. We show that our
algorithm achieves a moderate speed-up over the lsqnonneg routine and a
substantial speed-up over the FNNLS and PQN-NNLS algorithms for our
data-set.
References