Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures

Heavner, Nathan; Igual, Francisco; Quintana-Ortí, Gregorio; MARTINSSON, GUNNAR

dc.contributor.author	Heavner, Nathan
dc.contributor.author	Igual, Francisco
dc.contributor.author	Quintana-Ortí, Gregorio
dc.contributor.author	MARTINSSON, GUNNAR
dc.date.accessioned	2022-10-06T11:33:01Z
dc.date.available	2022-10-06T11:33:01Z
dc.date.issued	2022-06
dc.identifier.citation	N. Heavner, F. D. Igual, G. Quintana-Ortí, and P. G. Martinsson. 2022. Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures. ACM Trans. Math. Softw. 48, 2, Article 21 (June 2022), 42 pages. https://doi.org/10.1145/3507466	ca_CA
dc.identifier.issn	0098-3500
dc.identifier.issn	1557-7295
dc.identifier.uri	http://hdl.handle.net/10234/200216
dc.description.abstract	Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently proposed algorithm “randUTV” computes a full factorization of a given matrix that provides low-rank approximations with near-optimal error. Because the bulk of randUTV is cast in terms of communication-efficient operations such as matrix-matrix multiplication and unpivoted QR factorizations, it is faster than competing rank-revealing factorization methods such as column-pivoted QR in most high-performance computational settings. In this article, optimized randUTV implementations are presented for both shared-memory and distributed-memory computing environments. For shared memory, randUTV is redesigned in terms of an algorithm-by-blocks that, together with a runtime task scheduler, eliminates bottlenecks from data synchronization points to achieve acceleration over the standard blocked algorithm based on a purely fork-join approach. The distributed-memory implementation is based on the ScaLAPACK library. The performance of our new codes compares favorably with competing factorizations available on both shared-memory and distributed-memory architectures.	ca_CA
dc.format.extent	42 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Association for Computing Machinery (ACM)	ca_CA
dc.relation.isPartOf	ACM Transactions on Mathematical Software (TOMS), 2022, vol. 48, no 2	ca_CA
dc.rights	Copyright © ACM, Inc.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/CNE/1.0/	ca_CA
dc.subject	mathematics of computing	ca_CA
dc.subject	computations on matrices	ca_CA
dc.title	Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	https://doi.org/10.1145/3507466
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	https://dl.acm.org/doi/full/10.1145/3507466	ca_CA
dc.type.version	info:eu-repo/semantics/submittedVersion	ca_CA

Ficheros en el ítem

Nombre:: 81186.pdf
Tamaño:: 1.322Mb
Formato:: PDF
Descripción:: Versió pre-print

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [425]

Mostrar el registro sencillo del ítem