Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors
![Thumbnail](/xmlui/bitstream/handle/10234/204435/86643.pdf.jpg?sequence=4&isAllowed=y)
Visualitza/
Impacte
![Google Scholar](/xmlui/themes/Mirage2/images/uji/logo_google.png)
![Microsoft Academico](/xmlui/themes/Mirage2/images/uji/logo_microsoft.png)
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processorsAutoria
Data de publicació
2023-08-04Editor
WileyCita bibliogràfica
ALIAGA, José I., et al. Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors. Concurrency and Computation: Practice and Experience, 2023, p. e7871.Tipus de document
info:eu-repo/semantics/articleVersió
info:eu-repo/semantics/publishedVersionParaules clau / Matèries
Resum
Many practical algorithms for numerical rank computations implement an iterative procedure that involves repeated multiplications of a vector, or a collection of vectors, with both a sparse matrix A
and its transpose. ... [+]
Many practical algorithms for numerical rank computations implement an iterative procedure that involves repeated multiplications of a vector, or a collection of vectors, with both a sparse matrix A
and its transpose. Unfortunately, the realization of these sparse products on current high performance libraries often deliver much lower arithmetic throughput when the matrix involved in the product is transposed. In this work, we propose a hybrid sparse matrix layout, named CSRC, that combines the flexibility of some well-known sparse formats to offer a number of appealing properties: (1) CSRC can be obtained at low cost from the popular CSR (compressed sparse row) format; (2) CSRC has similar storage requirements as CSR; and especially, (3) the implementation of the sparse product kernels delivers high performance for both the direct product and its transposed variant on modern graphics accelerators thanks to a significant reduction of atomic operations compared to a conventional implementation based on CSR. This solution thus renders considerably higher performance when integrated into an iterative algorithm for the truncated singular value decomposition (SVD), such as the randomized SVD or, as demonstrated in the experimental results, the block Golub–Kahan–Lanczos algorithm. [-]
Entitat finançadora
US Exascale Computing Project | U.S. Department of Energy Office of Science | European High-Performance Computing Joint Undertaking (JU) | European Union's Horizon 2020 Research and Innovation Programme | Spanish National Plan for Scientific and Technical Research and Innovation (MCIN/AEI/10.13039/501100011033) | Universitat Jaume I
Codi del projecte o subvenció
17-SC-20-SC | 955558 (eFlows4HPC project) | PID2020-113656RB | UJI-B2021-58
Drets d'accés
© 2023 The Authors. Concurrency and Computation: Practice and Experience published by John Wiley & Sons Ltd.
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Apareix a les col.leccions
- ICC_Articles [427]