Buscar
Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers
(Springer Verlag, 2015)
We analyze power dissipation and energy consumption during the execution
of high-performance dense linear algebra kernels on multi-core processors. On top of
this analysis, we propose and evaluate several strategies to ...
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS
(ElsevierNorth-Holland, 2022-03-22)
We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors ...
Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors
(ElsevierAcademic Press, 2022-05-30)
Convolutional Neural Networks (CNNs) play a crucial role in many image recognition and classification tasks, recommender systems, brain-computer interfaces, etc. As a consequence, there is a notable interest in developing ...
Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems
(Springer US, 2014)
We investigate the benefits that an energyaware
implementation of the runtime in charge of
the concurrent execution of ILUPACK —a sophisticated
preconditioned iterative solver for sparse linear
systems— produces on the ...
GPU-based Dynamic Wave Field Synthesis using Fractional Delay Filters and Room Compensation
(IEEE, 2017-02)
Wave Field Synthesis (WFS) is a multichannel audio reproduction method, of a considerable computational
cost that renders an accurate spatial sound field using a large number of loudspeakers to emulate
virtual sound ...
On the performance of a GPU-based SoC in a distributed spatial audio system
(Springer, 2021-01-04)
Many current system-on-chip (SoC) devices are composed of low-power multicore processors combined with a small graphics accelerator (or GPU) offering a trade-off between computational capacity and low-power consumption. ...
The Impact of the Multi-core Revolution on Signal Processing
(Universidad Politécnica de Valencia, 2010)
This paper analyzes the influence of new multi- core and many-core architectures on Signal Processing. The article covers both the architectural design and the programming models of current general-purpose multi-core ...
Exploiting nested task-parallelism in the H-LU factorization
(Elsevier, 2019-04)
We address the parallelization of the LU factorization of hierarchical matrices (-matrices) arising from boundary element methods. Our approach exploits task-parallelism via the OmpSs programming model and runtime, which ...