Buscar
Analytical Modeling is Enough for High Performance BLIS
(ACM, 2016-09)
We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...
Solving Matrix Equations on Multi-Core and Many-Core Architectures
(MDPI, 2013-12)
We address the numerical solution of Lyapunov, algebraic and differential Riccati equations, via the matrix sign function, on platforms equipped with general-purpose multicore processors and, optionally, one or more graphics ...
Accelerating the Lyapack library using GPUs
(Springer, 2013)
Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures
(Elsevier, 2018)
Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures
(Elsevier, 2017)
Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
Time and energy modeling of high–performance Level-3 BLAS on x86 architectures
(Elsevier, 2015-06)
We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 ...
Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers
(Springer Verlag, 2015)
We analyze power dissipation and energy consumption during the execution
of high-performance dense linear algebra kernels on multi-core processors. On top of
this analysis, we propose and evaluate several strategies to ...
Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi
(Elsevier, 2015-08)
The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...
Parallelizing dense and banded linear algebra libraries using SMPSs
(John Wiley, 2009-12-25)
The promise of future many-core processors, with hundreds of threads running concurrently, has led the developers of linear algebra libraries to rethink their design in order to extract more parallelism, further exploit ...
Increasing data locality and introducing Level-3 BLAS in the Neville elimination
(Elsevier, 2011-12-01)
In this paper we present two new algorithmic variants to compute the Neville elimination, with and without pivoting, which improve data locality and cast most of the computations in terms of high-performance Level 3 BLAS. ...