===firstname: Yves ===firstname3: ===affil6: ===lastname3: ===email: yves.robert@inria.fr ===keyword_other2: exascale ===lastname6: ===affil5: ===lastname4: ===lastname7: ===affil7: ===postal: Laboratoire LIP ENS Lyon 69364 Lyon Cedex 07, France ===ABSTRACT: This work focuses on resilience techniques at extreme scale, dealing with fail-stop and silent errors simultaneously. We present a unified framework and optimal algorithmic solutions to cope with both error sources. Silent errors are handled via verification mechanisms (either partially or fully accurate) and in-memory checkpoints. Fail-stop errors are processed via disk checkpoints. All verification and checkpoint types are combined into computational patterns. We provide a unified model, and a full characterization of the optimal pattern. Our results nicely extend several published solutions. We instantiate the model for sparse iterative solvers and discuss several application-specific error detection and correction mechanisms, including partial recomputations, orthogonality checks and ABFT. Joint work with Anne Benoit, Aurelien Cavelan, Massimiliamo Fasi, Julien Langou, Hongyang Sun and Bora Ucar. ===affil3: ===title: Optimal resilience patterns to cope with fail-stop and silent errors -- application to sparse iterative solvers ===affil2: ===lastname2: ===firstname4: ===keyword1: Iterative solvers/linear algebra on high concurrency node architectures ===workshop: no ===lastname: Robert ===firstname5: ===keyword2: NOT_SPECIFIED ===otherauths: ===affil4: ===competition: no ===firstname7: ===firstname6: ===keyword_other1: exascale ===lastname5: ===affilother: ===firstname2: