As architectures become more and more complex, failures come increasingly into play and simulation behavior becomes less predictable. Especially worrisome is the emerging prevalence of soft errors, i.e. bit flips, due to systems operating at such low voltages. Soft errors are especially insidious because they may not even be detected. In this work, we investigate the sensitivity of several linear solvers to soft errors.
In many simulations, linear algebra accounts for more than 80% of the computational time, making it a critical component in over-all fault tolerance. However, the linear solver does not function in isolation and must be considered as an integrated component of the algorithmic stack in a simulation. We investigate the effects of soft errors in the linear solver on the overall simulation.