Programming matrix algorithms-by-blocks for thread-level parallelism

Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A.; Van Zee, Field G.; Chan, Ernie

dc.contributor.author	Quintana-Ortí, Gregorio
dc.contributor.author	Quintana-Orti, Enrique S.
dc.contributor.author	Van de Geijn, Robert A.
dc.contributor.author	Van Zee, Field G.
dc.contributor.author	Chan, Ernie
dc.date.accessioned	2011-05-20T07:27:29Z
dc.date.available	2011-05-20T07:27:29Z
dc.date.issued	2009-07
dc.identifier.citation	QUINTANA-ORTÍ, Gregorio, et al. Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software (TOMS), 2009, vol. 36, no 3, p. 1-26.
dc.identifier.issn	0098-3500
dc.identifier.uri	http://hdl.handle.net/10234/22583
dc.description.abstract	With the emergence of thread-level parallelism as the primary means for continued improvement of performance, the programmability issue has reemerged as an obstacle to the use of architectural advances. We argue that evolving legacy libraries for dense and banded linear algebra is not a viable solution due to constraints imposed by early design decisions. We propose a philosophy of abstraction and separation of concerns that provides a promising solution in this problem domain. The first abstraction, FLASH, allows algorithms to express computation with matrices consisting of blocks, facilitating algorithms-by-blocks. Transparent to the library implementor, operand descriptions are registered for a particular operation a priori. A runtime system, SuperMatrix, uses this information to identify data dependencies between suboperations, allowing them to be scheduled to threads out-of-order and executed in parallel. But not all classical algorithms in linear algebra lend themselves to conversion to algorithms-by-blocks. We show how our recently proposed LU factorization with incremental pivoting and closely related algorithm-by-blocks for the QR factorization, both originally designed for out-of-core computation, overcome this difficulty. Anecdotal evidence regarding the development of routines with a core functionality demonstrates how the methodology supports high productivity while experimental results suggest that high performance is abundantly achievable	ca_CA
dc.format.extent	26 p.
dc.language.iso	eng	ca_CA
dc.publisher	Association for Computing Machinery	ca_CA
dc.relation.isFormatOf	Versió pre-print del document publicat a: http://portal.acm.org/citation.cfm?id=J782
dc.relation.isPartOf	ACM transactions on mathematical software, 2009, vol. 36, no. 3
dc.rights	© Association for Computing Machinery
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	Algorithms	ca_CA
dc.subject	Performance	ca_CA
dc.subject	Linear algebra	ca_CA
dc.subject	Libraries	ca_CA
dc.subject	High-performance	ca_CA
dc.subject	Multithreaded	ca_CA
dc.subject.lcsh	Computer algorithms
dc.subject.other	Algorismes computacionals
dc.title	Programming matrix algorithms-by-blocks for thread-level parallelism	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.rights.accessRights	info:eu-repo/semantics/openAccess

Ficheros en el ítem

Nombre:: 34980.pdf
Tamaño:: 324.2Kb
Formato:: PDF
Descripción:: versió pre-print

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [424]

Mostrar el registro sencillo del ítem