In this talk, we consider a recently proposed variable-block variant VBARMS of the ARMS preconditioner for solving general nonsymmetric linear systems. This preconditioner can detect automatically exact or approximate dense structures in a matrix to improve reliability and throughput during factorization. We show how offloading parts of VBARMS to Xeon Phi accelerators may speed up the preconditioner construction by revealing a high-degree of the parallelism. We compare the resulting performance and convergence with a parallel VBARMS and distributed Schur-Complement preconditioner on large turbulent-flow test cases.