Listar por tema "High-performance"
Mostrando ítems 1-5 de 5
-
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
ACM (2012-08)Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ... -
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
Springer Verlag (2008)We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ... -
Families of Algorithms for Reducing a Matrix to Condensed Form
ACM (2012-11)In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The ... -
Programming matrix algorithms-by-blocks for thread-level parallelism
Association for Computing Machinery (2009-07)With the emergence of thread-level parallelism as the primary means for continued improvement of performance, the programmability issue has reemerged as an obstacle to the use of architectural advances. We argue that ... -
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
ACM Digital Library (2014-04)We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be- come rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly ...