Programming parallel dense matrix factorizations with look-ahead and OpenMP
Ver/ Abrir
Impacto
Scholar |
Otros documentos de la autoría: Catalán, Sandra; Castelló, Adrián; Igual, Francisco; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S.
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadatos
Título
Programming parallel dense matrix factorizations with look-ahead and OpenMPAutoría
Fecha de publicación
2019Editor
SpringerISSN
1386-7857; 1573-7543Cita bibliográfica
CATALÁN, Sandra, et al. Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Computing, 2019Tipo de documento
info:eu-repo/semantics/articleVersión de la editorial
https://link.springer.com/article/10.1007/s10586-019-02927-zVersión
info:eu-repo/semantics/submittedVersionPalabras clave / Materias
Resumen
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded ... [+]
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear algebra subroutines (BLAS). The proposed approach is also different from the more sophisticated runtime-based implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a high level of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of linear algebra package (LAPACK) functionality on any multicore platform with an OpenMP-like runtime. [-]
Publicado en
Cluster Computing, 2019Derechos de acceso
Aparece en las colecciones
- ICC_Articles [427]