Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units

Aliaga Estellés, José Ignacio; Anzt, Hartwig; Grützmacher, Thomas; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique

dc.contributor.author	Aliaga Estellés, José Ignacio
dc.contributor.author	Anzt, Hartwig
dc.contributor.author	Grützmacher, Thomas
dc.contributor.author	Quintana-Orti, Enrique S.
dc.contributor.author	Tomás Domínguez, Andrés Enrique
dc.date.accessioned	2021-11-04T08:12:21Z
dc.date.available	2021-11-04T08:12:21Z
dc.date.issued	2021
dc.identifier.citation	Aliaga, JI, Anzt, H, Grützmacher, T, Quintana-Ortí, ES, Tomás, AE. Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units. Concurrency Computat Pract Exper. 2021;e6515. https://doi.org/10.1002/cpe.6515	ca_CA
dc.identifier.issn	1532-0634
dc.identifier.issn	1532-0626
dc.identifier.uri	http://hdl.handle.net/10234/195374
dc.description.abstract	We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the numerical information. Our approach is multi-platform, in the sense that the realizations for (general-purpose) multicore processors as well as graphics accelerators (GPUs) are built upon common principles, but differ in the implementation details, which are adapted to avoid thread divergence in the GPU case or maximize compression element-wise (i.e., for each matrix entry) for multicore architectures. Our evaluation on the two last generations of NVIDIA GPUs as well as Intel and AMD processors demonstrate the benefits of the new kernels when compared with the optimized implementations of the sparse matrix-vector product in NVIDIA's cuSPARSE and Intel's MKL, respectively.	ca_CA
dc.format.extent	13 p.	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	John Wiley and Sons	ca_CA
dc.relation.isPartOf	Concurrency and Computation: Practice and Experience, 2021	ca_CA
dc.rights	Copyright © John Wiley & Sons, Inc.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/CNE/1.0/	ca_CA
dc.subject	compression	ca_CA
dc.subject	coordinate sparse matrix format	ca_CA
dc.subject	graphics processing units (GPUs)	ca_CA
dc.subject	multicoreprocessors (CPUs)	ca_CA
dc.subject	sparse matrix-vector product	ca_CA
dc.subject	workload balancing	ca_CA
dc.title	Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	https://doi.org/10.1002/cpe.6515
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6515	ca_CA
dc.description.sponsorship	J. I. Aliaga, E. S. Quintana-Ortí, and A. E. Tomás were supported by TIN2017-82972-R of the Spanish MINECO. H. Anzt and T. Grützmacher were supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241 and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank the Steinbuch Centre for Computing (SCC) of the Karlsruhe Institute of Technology for providing access to an NVIDIA A100 GPU.
dc.type.version	info:eu-repo/semantics/acceptedVersion	ca_CA
project.funder.name	Ministerio de Ciencia, Innovación y Universidades (España)	ca_CA
project.funder.name	Helmholtz Association	ca_CA
project.funder.name	United States Department of Energy (DOE)	ca_CA
oaire.awardNumber	TIN2017-82972	ca_CA
oaire.awardNumber	VH-NG-1241	ca_CA
oaire.awardNumber	17-SC-20-SC	ca_CA

Ficheros en el ítem

Nombre:: 77012.pdf
Tamaño:: 341.7Kb
Formato:: PDF
Descripción:: Versió post-print

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [423]

Mostrar el registro sencillo del ítem