A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
Impacto
Scholar |
Otros documentos de la autoría: Quintana-Ortí, Gregorio; Igual, Francisco D.; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A.
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONEste recurso está restringido
http://dx.doi.org/10.1145/2331130.2331133 |
Metadatos
Título
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded ArchitecturesAutoría
Fecha de publicación
2012-08Editor
ACMCita bibliográfica
ACM Transactions on Mathematical Software (TOMS), 38, 4, article 25Tipo de documento
info:eu-repo/semantics/articleVersión de la editorial
http://dl.acm.org/citation.cfm?id=2331130&picked=prox&preflayout=tabsPalabras clave / Materias
Resumen
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we
show ... [+]
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we
show how the current state of hardware and software allows the programmability problem to be addressed
without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of
submatrices and computation as operations with those submatrices. This enables libraries to be coded at a
high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands
of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of
in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of
I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as
platforms equipped with hardware accelerators. [-]
Derechos de acceso
Copyright 2012 ACM
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
Aparece en las colecciones
- ICC_Articles [413]