Mostrar el registro sencillo del ítem

dc.contributor.authorQuintana-Ortí, Gregorio
dc.contributor.authorQuintana-Orti, Enrique S.
dc.contributor.authorVan de Geijn, Robert A.
dc.contributor.authorVan Zee, Field G.
dc.contributor.authorChan, Ernie
dc.date.accessioned2011-05-20T07:27:29Z
dc.date.available2011-05-20T07:27:29Z
dc.date.issued2009-07
dc.identifier.citationQUINTANA-ORTÍ, Gregorio, et al. Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software (TOMS), 2009, vol. 36, no 3, p. 1-26.
dc.identifier.issn0098-3500
dc.identifier.urihttp://hdl.handle.net/10234/22583
dc.description.abstractWith the emergence of thread-level parallelism as the primary means for continued improvement of performance, the programmability issue has reemerged as an obstacle to the use of architectural advances. We argue that evolving legacy libraries for dense and banded linear algebra is not a viable solution due to constraints imposed by early design decisions. We propose a philosophy of abstraction and separation of concerns that provides a promising solution in this problem domain. The first abstraction, FLASH, allows algorithms to express computation with matrices consisting of blocks, facilitating algorithms-by-blocks. Transparent to the library implementor, operand descriptions are registered for a particular operation a priori. A runtime system, SuperMatrix, uses this information to identify data dependencies between suboperations, allowing them to be scheduled to threads out-of-order and executed in parallel. But not all classical algorithms in linear algebra lend themselves to conversion to algorithms-by-blocks. We show how our recently proposed LU factorization with incremental pivoting and closely related algorithm-by-blocks for the QR factorization, both originally designed for out-of-core computation, overcome this difficulty. Anecdotal evidence regarding the development of routines with a core functionality demonstrates how the methodology supports high productivity while experimental results suggest that high performance is abundantly achievableca_CA
dc.format.extent26 p.
dc.language.isoengca_CA
dc.publisherAssociation for Computing Machineryca_CA
dc.relation.isFormatOfVersió pre-print del document publicat a: http://portal.acm.org/citation.cfm?id=J782
dc.relation.isPartOfACM transactions on mathematical software, 2009, vol. 36, no. 3
dc.rights© Association for Computing Machinery
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/*
dc.subjectAlgorithmsca_CA
dc.subjectPerformanceca_CA
dc.subjectLinear algebraca_CA
dc.subjectLibrariesca_CA
dc.subjectHigh-performanceca_CA
dc.subjectMultithreadedca_CA
dc.subject.lcshComputer algorithms
dc.subject.otherAlgorismes computacionals
dc.titleProgramming matrix algorithms-by-blocks for thread-level parallelismca_CA
dc.typeinfo:eu-repo/semantics/articleca_CA
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem