Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs
Ver/ Abrir
Impacto
Scholar |
Otros documentos de la autoría: Leon, German; Badia, Jose M.; BELLOCH, JOSE A.; LINDOSO, ALMUDENA; Entrena, Luis
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadatos
Título
Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUsFecha de publicación
2024-02-25Editor
SpringerISSN
0920-8542; 1573-0484Cita bibliográfica
Leon, G., Badia, J.M., Belloch, J.A. et al. Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05925-0Tipo de documento
info:eu-repo/semantics/articleVersión
info:eu-repo/semantics/publishedVersionPalabras clave / Materias
Resumen
Graphics processing units (GPUs) have become integral to embedded systems and
supercomputing centres due to their large memory, cutting-edge technology and
high performance per watt. However, their susceptibility ... [+]
Graphics processing units (GPUs) have become integral to embedded systems and
supercomputing centres due to their large memory, cutting-edge technology and
high performance per watt. However, their susceptibility to transient errors requires
a comprehensive analysis of error sensitivity, as well as the development of error
mitigation techniques and fault-tolerant algorithms. This study focuses on evaluating
the soft-error sensitivity of two distinct versions of LU decomposition algorithms
implemented on two very diferent GPUs—a low-power SoC embedded GPU and
a high-performance massively parallel GPU. Through extensive fault injection campaigns on both GPUs, we examine the vulnerability of the algorithms, identify error
causes, and determine critical code components requiring enhanced protection. The
experiments reveal that most single bit fip fault injections in the instruction results
lead to erroneous outcomes or unrecoverable errors. Notably, efcient GPU resource
utilisation can increase the number of masked errors, thereby enhancing error resilience. Additionally, while diferent parts of the code exhibit similar error occurrence
types and rates, the propagation of errors to elements within the result matrix difers
signifcantly [-]
Datos relacionados
No additional data or materials are available.Entidad financiadora
Gobierno de España | Gobierno Regional de Madrid | CRUE-CSIC agreement with Springer Nature
Código del proyecto o subvención
PID2020-113656RB-C21 | PID2022-138696OB-C21 | PID2022-1370480A-C43 | MIMACUHSPACE-CM-UC3M
Derechos de acceso
© The Author(s) 2024
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Aparece en las colecciones
- ICC_Articles [430]