Programming parallel dense matrix factorizations with look-ahead and OpenMP
![Thumbnail](/xmlui/bitstream/handle/10234/182890/64923.pdf.jpg?sequence=4&isAllowed=y)
View/ Open
Impact
![Google Scholar](/xmlui/themes/Mirage2/images/uji/logo_google.png)
![Microsoft Academico](/xmlui/themes/Mirage2/images/uji/logo_microsoft.png)
Metadata
Show full item recordcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadata
Title
Programming parallel dense matrix factorizations with look-ahead and OpenMPAuthor (s)
Date
2019Publisher
SpringerISSN
1386-7857; 1573-7543Bibliographic citation
CATALÁN, Sandra, et al. Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Computing, 2019Type
info:eu-repo/semantics/articlePublisher version
https://link.springer.com/article/10.1007/s10586-019-02927-zVersion
info:eu-repo/semantics/submittedVersionSubject
Abstract
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded ... [+]
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear algebra subroutines (BLAS). The proposed approach is also different from the more sophisticated runtime-based implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a high level of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of linear algebra package (LAPACK) functionality on any multicore platform with an OpenMP-like runtime. [-]
Is part of
Cluster Computing, 2019Rights
This item appears in the folowing collection(s)
- ICC_Articles [425]