Mostrar el registro sencillo del ítem

dc.contributor.authorAnzt, Hartwig
dc.contributor.authorDongarra, Jack
dc.contributor.authorFlegar, Goran
dc.contributor.authorQuintana-Orti, Enrique S.
dc.date.accessioned2019-05-16T10:50:39Z
dc.date.available2019-05-16T10:50:39Z
dc.date.issued2019
dc.identifier.citationANZT, Hartwig, et al. Variable-size batched Gauss–Jordan elimination for block-Jacobi preconditioning on graphics processors. Parallel Computing, 2019, vol. 81, p. 131-146.ca_CA
dc.identifier.issn0167-8191
dc.identifier.urihttp://hdl.handle.net/10234/182511
dc.description.abstractIn this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully realize this implementation, we develop a variable-size batched matrix inversion kernel that uses Gauss-Jordan elimination (GJE) along with a variable-size batched matrix–vector multiplication kernel that transforms the linear systems’ right-hand sides into the solution vectors. Our kernels make heavy use of the increased register count and the warp-local communication associated with newer GPU architectures. Moreover, in the matrix inversion, we employ an implicit pivoting strategy that migrates the workload (i.e., operations) to the place where the data resides instead of moving the data to the executing cores. We complement the matrix inversion with extraction and insertion strategies that allow the block-Jacobi preconditioner to be set up rapidly. The experiments on NVIDIA’s K40 and P100 architectures reveal that our variable-size batched matrix inversion routine outperforms the CUDA basic linear algebra subroutine (cuBLAS) library functions that provide the same (or even less) functionality. We also show that the preconditioner setup and preconditioner application cost can be somewhat offset by the faster convergence of the iterative solver.ca_CA
dc.format.extent16 p.ca_CA
dc.language.isoengca_CA
dc.publisherElsevierca_CA
dc.relation.isPartOfParallel Computing, Volume 81, January 2019.ca_CA
dc.rights0167-8191/© 2018 Elsevier B.V. All rights reserved.ca_CA
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/*
dc.subjectBatched algorithmsca_CA
dc.subjectMatrix inversionca_CA
dc.subjectGauss–Jordan eliminationca_CA
dc.subjectBlock-Jacobica_CA
dc.subjectSparse linear systemsca_CA
dc.subjectGraphics processorca_CA
dc.titleVariable-size batched Gauss–Jordan elimination for block-Jacobi preconditioning on graphics processorsca_CA
dc.typeinfo:eu-repo/semantics/articleca_CA
dc.identifier.doihttps://doi.org/10.1016/j.parco.2017.12.006
dc.relation.projectIDDE-SC-0010042 ; VH-NG-1241 ; TIN2014-53495-R ; 732631ca_CA
dc.rights.accessRightsinfo:eu-repo/semantics/restrictedAccessca_CA
dc.relation.publisherVersionhttps://www.sciencedirect.com/science/article/pii/S0167819117302107ca_CA
dc.type.versioninfo:eu-repo/semantics/publishedVersionca_CA


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem