Buscar

Mostrando ítems 1-10 de 14

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Martín, Alberto F.; Reyes, Ruymán; Badía Sala, Rosa María; Quintana-Orti, Enrique S. (Elsevier, 2014)

In this paper, we investigate how to exploit task-parallelism during the execution of the Cholesky factorization on clusters of multicore processors with the SMPSs programming model. Our analysis reveals that the major ...

Analytical Modeling is Enough for High Performance BLIS

Low, Tze Meng; Igual, Francisco D.; Smith, Tyler M.; Quintana-Orti, Enrique S. (ACM, 2016-09)

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Van Zee, Field G.; Van de Geijn, Robert A.; Quintana-Ortí, Gregorio (ACM Digital Library, 2014-04)

We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be- come rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly ...

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. (Springer, 2016-02-05)

We present accurate time and energy piece-wise models of high-performance multi-threaded implementations for the general matrix multiplication, triangular system solve with multiple right-hand sides, and symmetric rank-k ...

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations

Costero, Luis; Igual, Francisco D.; Olcoz, Katzalin; Catalán, Sandra; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. (Elsevier, 2017)

Dealing with asymmetry in the architecture opens a plethora of questions related with the performance- and energy-efficient scheduling of task-parallel applications. While there exist early attempts to tackle this problem, ...

Autoría

Quintana-Orti, Enrique S. (13)

Igual, Francisco D. (8)

Condiciones de acceso

Open Access (11)

Restricted Access (3)

Repositori Universitat Jaume I

Buscar

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Analytical Modeling is Enough for High Performance BLIS

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations

Time and energy modeling of high–performance Level-3 BLAS on x86 architectures

Programming matrix algorithms-by-blocks for thread-level parallelism

The libflame library for dense matrix computations

Attaining High Performance in General-Purpose Computations on Current Graphics Processors

Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors

Buscar

Filtros

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Analytical Modeling is Enough for High Performance BLIS

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations

Time and energy modeling of high–performance Level-3 BLAS on x86 architectures

Programming matrix algorithms-by-blocks for thread-level parallelism

The libflame library for dense matrix computations

Attaining High Performance in General-Purpose Computations on Current Graphics Processors

Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors