Buscar
Analytical Modeling is Enough for High Performance BLIS
(ACM, 2016-09)
We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...
Solving Matrix Equations on Multi-Core and Many-Core Architectures
(MDPI, 2013-12)
We address the numerical solution of Lyapunov, algebraic and differential Riccati equations, via the matrix sign function, on platforms equipped with general-purpose multicore processors and, optionally, one or more graphics ...
Revisiting the Gauss-Huard Algorithm for the Solution of Linear Systems on Graphics Accelerators
(Springer, 2016-04-02)
In 1979, P. Huard presented an efficient variant of the Gauss-Jordan elimination for the solution of linear systems. In particular, this alternative algorithm exhibits the same computational cost as the traditional LU-based ...
Accelerating the Lyapack library using GPUs
(Springer, 2013)
Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures
(Elsevier, 2018)
Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures
(Elsevier, 2017)
Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
Time and energy modeling of high–performance Level-3 BLAS on x86 architectures
(Elsevier, 2015-06)
We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 ...
Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers
(Springer Verlag, 2015)
We analyze power dissipation and energy consumption during the execution
of high-performance dense linear algebra kernels on multi-core processors. On top of
this analysis, we propose and evaluate several strategies to ...
Attaining High Performance in General-Purpose Computations on Current Graphics Processors
(Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I, 2008-01)
The increase in performance of the last generations of graphics processors
(GPUs) has made this class of hardware a coprocessing platform of remarkable
success in certain types of operations. In this paper we evaluate ...
Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors
(Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I, 2008-01)
The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a
coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the ...