Listar por tema "High-performance"

A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

Quintana-Ortí, Gregorio; Igual, Francisco D.; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)

Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...

An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Van de Geijn, Robert A. Springer Verlag (2008)

We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ...

Families of Algorithms for Reducing a Matrix to Condensed Form

Van Zee, Field G.; Van de Geijn, Robert A.; Quintana-Ortí, Gregorio; Elizondo, G. Joseph ACM (2012-11)

In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The ...

Programming matrix algorithms-by-blocks for thread-level parallelism

Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A.; Van Zee, Field G.; Chan, Ernie Association for Computing Machinery (2009-07)

With the emergence of thread-level parallelism as the primary means for continued improvement of performance, the programmability issue has reemerged as an obstacle to the use of architectural advances. We argue that ...

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Van Zee, Field G.; Van de Geijn, Robert A.; Quintana-Ortí, Gregorio ACM Digital Library (2014-04)

We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be- come rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly ...