Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures
Ver/ Abrir
Impacto
Scholar |
Otros documentos de la autoría: Heavner, Nathan; Igual, Francisco; Quintana-Ortí, Gregorio; MARTINSSON, GUNNAR
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadatos
Título
Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing ArchitecturesFecha de publicación
2022-06Editor
Association for Computing Machinery (ACM)ISSN
0098-3500; 1557-7295Cita bibliográfica
N. Heavner, F. D. Igual, G. Quintana-Ortí, and P. G. Martinsson. 2022. Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures. ACM Trans. Math. Softw. 48, 2, Article 21 (June 2022), 42 pages. https://doi.org/10.1145/3507466Tipo de documento
info:eu-repo/semantics/articleVersión de la editorial
https://dl.acm.org/doi/full/10.1145/3507466Versión
info:eu-repo/semantics/submittedVersionPalabras clave / Materias
Resumen
Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the ... [+]
Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently proposed algorithm “randUTV” computes a full factorization of a given matrix that provides low-rank approximations with near-optimal error. Because the bulk of randUTV is cast in terms of communication-efficient operations such as matrix-matrix multiplication and unpivoted QR factorizations, it is faster than competing rank-revealing factorization methods such as column-pivoted QR in most high-performance computational settings. In this article, optimized randUTV implementations are presented for both shared-memory and distributed-memory computing environments. For shared memory, randUTV is redesigned in terms of an algorithm-by-blocks that, together with a runtime task scheduler, eliminates bottlenecks from data synchronization points to achieve acceleration over the standard blocked algorithm based on a purely fork-join approach. The distributed-memory implementation is based on the ScaLAPACK library. The performance of our new codes compares favorably with competing factorizations available on both shared-memory and distributed-memory architectures. [-]
Publicado en
ACM Transactions on Mathematical Software (TOMS), 2022, vol. 48, no 2Derechos de acceso
Aparece en las colecciones
- ICC_Articles [423]