A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
Impact
Scholar |
Other documents of the author: Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A.
Metadata
Show full item recordcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONThis resource is restricted
http://dx.doi.org/10.1145/2331130.2331133 |
Metadata
Title
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded ArchitecturesAuthor (s)
Date
2012-08Publisher
ACMBibliographic citation
ACM Transactions on Mathematical Software (TOMS), 38, 4, article 25Type
info:eu-repo/semantics/articlePublisher version
http://dl.acm.org/citation.cfm?id=2331130&picked=prox&preflayout=tabsSubject
Abstract
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we
show ... [+]
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we
show how the current state of hardware and software allows the programmability problem to be addressed
without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of
submatrices and computation as operations with those submatrices. This enables libraries to be coded at a
high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands
of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of
in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of
I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as
platforms equipped with hardware accelerators. [-]
Rights
Copyright 2012 ACM
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
This item appears in the folowing collection(s)
- ICC_Articles [430]