A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting
Visualitza/
Impacte
Scholar |
Altres documents de l'autoria: Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael; Van de Geijn, Robert A.
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial PivotingAutoria
Data de publicació
2019-01Editor
IEEECita bibliogràfica
CATALÁN, Sandra, et al. A case for malleable thread-level linear algebra libraries: The LU factorization with partial pivoting. IEEE access, 2019, 7: 17617-17633.Tipus de document
info:eu-repo/semantics/articleVersió de l'editorial
https://ieeexplore.ieee.org/abstract/document/8630926Versió
info:eu-repo/semantics/publishedVersionParaules clau / Matèries
Resum
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques ... [+]
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario where two thread teams are created/activated during the factorization, with each team in charge of performing an independent task/branch of execution. The first technique promotes worker sharing (WS) between the two tasks, allowing the threads of the task that completes first to be reallocated for use by the costlier task. The second technique allows a fast task to alert the slower task of completion, enforcing the early termination (ET) of the second task, and a smooth transition of the factorization procedure into the next iteration. The two mechanisms are instantiated via a new malleable thread-level implementation of the basic linear algebra subprograms, and their benefits are illustrated via an implementation of the LU factorization with partial pivoting enhanced with look-ahead. Concretely, our experimental results on an Intel-Xeon system with 12 cores show the benefits of combining WS+ET, reporting competitive performance in comparison with a task-parallel runtime-based solution. [-]
Proyecto de investigación
Spanish Ministerio de Economía y Competitividad (Project TIN2014-53495-R, ProjectTIN2015-65316-P, and Project TIN2017-82972-R ) ; H2020 EU FETHPC ‘‘INTERTWinE’’ (Project 671602) ; Generalitat de Catalunya (Project 2017-SGR-1414) ; NSF (Grant ACI-1550493)Drets d'accés
© Copyright 2019 IEEE - All rights reserved.
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/openAccess
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/openAccess
Apareix a les col.leccions
- ICC_Articles [417]