Tuning stationary iterative solvers for fault resilience
comunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/146069
comunitat-uji-handle4:
INVESTIGACIONEste recurso está restringido
http://dx.doi.org/10.1145/2832080.2832081 |
Metadatos
Título
Tuning stationary iterative solvers for fault resilienceFecha de publicación
2015Editor
ACM. Association for Computing MachineryISBN
978-1-4503-4011-3Cita bibliográfica
Anzt, H., Dongarra, J., & Quintana-Ortí, E. S. (2015, November). Tuning stationary iterative solvers for fault resilience. In Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (p. 1). ACM.Tipo de documento
info:eu-repo/semantics/conferenceObjectVersión de la editorial
http://dl.acm.org/citation.cfm?id=2832081Palabras clave / Materias
Resumen
As the transistor’s feature size decreases following Moore’s Law,
hardware will become more prone to permanent, intermittent, and
transient errors, increasing the number of failures experienced by
applications, and ... [+]
As the transistor’s feature size decreases following Moore’s Law,
hardware will become more prone to permanent, intermittent, and
transient errors, increasing the number of failures experienced by
applications, and diminishing the confidence of users. As a result,
resilience is considered the most difficult under addressed issue
faced by the High Performance Computing community.
In this paper, we address the design of error resilient iterative
solvers for sparse linear systems. Contrary to most previous ap-
proaches, based on Krylov subspace methods, for this purpose we
analyze stationary component-wise relaxation. Concretely, starting
from a plain implementation of the Jacobi iteration, we design a
low-cost component-wise technique that elegantly handles bit-flips,
turning the initial synchronized solver into an asynchronous itera-
tion. Our experimental study employs sparse incomplete factoriza-
tions from several practical applications to expose the convergence
delay incurred by the fault-tolerant implementation. [-]
Descripción
Actes del 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '15)