Algorithms based on kernel methods play a central role in statistical machine learning. At their core are a number of linear algebra operations on matrices of kernel functions which take as arguments the training and testing data. A kernel function generalizes the notion of the similarity between a test and training point. Given a set of data points, , the kernel matrix is given by,
When the underlying system is ill-conditioned, there is a degradation of the performance of iterative approaches, necessitating the use of a preconditioner for the Krylov iterations. Popular preconditioners like Jacobi, SSOR, ILU can be used to improve the convergence, however, these preconditioners have an O() space and computation requirement for dense matrices, which would ruin any advantage gained by the fast matrix-vector products. The preconditioner must have a representation that allows for a fast matrix-vector product just like the kernel matrix. We propose a Tikhonov regularized kernel matrix solved with a truncated conjugate gradient algorithm as a preconditioner for the kernel matrix, which can be accelerated using the aforementioned approaches. Because the symmetry of this preconditioner cannot be guaranteed, we use a flexible GMRES (Saad, ) algorithm.
A good preconditioner will improve the convergence of the iterative approach at the expense of an increased cost per iteration. The convergence with proposed preconditioner is shown to be an order of magnitude better than the unpreconditioned approach. But, for a preconditioner to be useful, the total time taken by the preconditioned approach should be less than the unpreconditioned approach. This is achieved by using the the fast matrix-vector product algorithms, resulting in a computationally efficient solver with faster convergence. We use two classes of fast matrix vector products, approximation-based acceleration (eg. FIGTREE (Morariu et al., ) for Gaussian kernel) and GPU-based parallelization (Srinivasan et al, ). Each of these two approaches have their own advantages and disadvantages. We propose a strategy to choose the appropriate acceleration in the light of the desired accuracy.
The performance is further illustrated in popular learning approaches namely, radial basis function interpolation, Gaussian process regression and kernel classification. There is an improvement of upto X in the number of iterations to converge and X in the total time taken compared to a conjugate gradient based approach. The core preconditioning strategy proposed here will soon be released as an open source package.