Mostrar el registro sencillo del ítem
Load-balancing Sparse Matrix Vector Product Kernels on GPUs
dc.contributor.author | Anzt, Hartwig | |
dc.contributor.author | Cojean, Terry | |
dc.contributor.author | Yen-Chen, Chen | |
dc.contributor.author | Dongarra, Jack | |
dc.contributor.author | Flegar, Goran | |
dc.contributor.author | Nayak, Pratik | |
dc.contributor.author | Tomov, Stanimire | |
dc.contributor.author | Tsai, Yuhsiang M. | |
dc.contributor.author | Wang, Weichung | |
dc.date.accessioned | 2020-07-28T08:00:45Z | |
dc.date.available | 2020-07-28T08:00:45Z | |
dc.date.issued | 2020-03 | |
dc.identifier.citation | Hartwig Anzt, Terry Cojean, Chen Yen-Chen, Jack Dongarra, Goran Flegar, Pratik Nayak, Stanimire Tomov, Yuhsiang M. Tsai, and Weichung Wang. 2020. Load-balancing Sparse Matrix Vector Product Kernels on GPUs. ACM Trans. Parallel Comput. 7, 1, Article 2 (March 2020), 26 pages. DOI:https://doi.org/10.1145/3380930 | ca_CA |
dc.identifier.issn | 2329-4949 | |
dc.identifier.issn | 2329-4957 | |
dc.identifier.uri | http://hdl.handle.net/10234/189298 | |
dc.description.abstract | Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format. | ca_CA |
dc.format.extent | 26 p. | ca_CA |
dc.language.iso | eng | ca_CA |
dc.publisher | Association for Computing Machinery (ACM) | ca_CA |
dc.relation.isPartOf | ACM Transactions on Parallel Computing, 2020, vol. 7, no 1 | ca_CA |
dc.rights | Copyright © Association for Computing Machinery | ca_CA |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | * |
dc.subject | Sparse Matrix Vector Product (SpMV) | ca_CA |
dc.subject | irregular matrices | ca_CA |
dc.subject | GPUs | ca_CA |
dc.title | Load-balancing Sparse Matrix Vector Product Kernels on GPUs | ca_CA |
dc.type | info:eu-repo/semantics/article | ca_CA |
dc.identifier.doi | https://doi.org/10.1145/3380930 | |
dc.rights.accessRights | info:eu-repo/semantics/restrictedAccess | ca_CA |
dc.relation.publisherVersion | https://dl.acm.org/doi/abs/10.1145/3380930 | ca_CA |
dc.type.version | info:eu-repo/semantics/publishedVersion | ca_CA |
Ficheros en el ítem
Ficheros | Tamaño | Formato | Ver |
---|---|---|---|
No hay ficheros asociados a este ítem. |
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
ICC_Articles [424]