Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Anzt, Hartwig; Cojean, Terry; Yen-Chen, Chen; Dongarra, Jack; Flegar, Goran; Nayak, Pratik; Tomov, Stanimire; Tsai, Yuhsiang M.; Wang, Weichung

dc.contributor.author	Anzt, Hartwig
dc.contributor.author	Cojean, Terry
dc.contributor.author	Yen-Chen, Chen
dc.contributor.author	Dongarra, Jack
dc.contributor.author	Flegar, Goran
dc.contributor.author	Nayak, Pratik
dc.contributor.author	Tomov, Stanimire
dc.contributor.author	Tsai, Yuhsiang M.
dc.contributor.author	Wang, Weichung
dc.date.accessioned	2020-07-28T08:00:45Z
dc.date.available	2020-07-28T08:00:45Z
dc.date.issued	2020-03
dc.identifier.citation	Hartwig Anzt, Terry Cojean, Chen Yen-Chen, Jack Dongarra, Goran Flegar, Pratik Nayak, Stanimire Tomov, Yuhsiang M. Tsai, and Weichung Wang. 2020. Load-balancing Sparse Matrix Vector Product Kernels on GPUs. ACM Trans. Parallel Comput. 7, 1, Article 2 (March 2020), 26 pages. DOI:https://doi.org/10.1145/3380930	ca_CA
dc.identifier.issn	2329-4949
dc.identifier.issn	2329-4957
dc.identifier.uri	http://hdl.handle.net/10234/189298
dc.description.abstract	Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.	ca_CA
dc.format.extent	26 p.	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Association for Computing Machinery (ACM)	ca_CA
dc.relation.isPartOf	ACM Transactions on Parallel Computing, 2020, vol. 7, no 1	ca_CA
dc.rights	Copyright © Association for Computing Machinery	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	Sparse Matrix Vector Product (SpMV)	ca_CA
dc.subject	irregular matrices	ca_CA
dc.subject	GPUs	ca_CA
dc.title	Load-balancing Sparse Matrix Vector Product Kernels on GPUs	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	https://doi.org/10.1145/3380930
dc.rights.accessRights	info:eu-repo/semantics/restrictedAccess	ca_CA
dc.relation.publisherVersion	https://dl.acm.org/doi/abs/10.1145/3380930	ca_CA
dc.type.version	info:eu-repo/semantics/publishedVersion	ca_CA

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [424]

Mostrar el registro sencillo del ítem