Listar ICC_Articles por autoría "d1d592b2-abd5-449d-9a7c-143b1de4b854"

A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

Aliaga Estellés, José Ignacio; Alonso-Jordá, Pedro; Badía, José; Chacón, Pablo; Davidovic, Davor; López Blanco, José R.; Quintana-Orti, Enrique S. Elsevier (2016-03-15)

We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). ...

A pipeline structure for the block QR update in digital signal processing

Dolz, Manuel F.; Alventosa, Fran J.; Alonso-Jordá, Pedro Springer (2019-03)

There exist problems in the field of digital signal processing, such as filtering of acoustic signals that require processing a large amount of data in real time. The beamforming algorithm, for instance, is a process that ...

Accelerating multi-channel filtering of audio signal on ARM processors

BELLOCH, JOSE A.; Alventosa, Juan J.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.; Vidal, Antonio M. Springer Verlag (2016-03)

Application of Multi-core and GPU Architectures on Signal Processing: Case Studies

González, Alberto; BELLOCH, JOSE A.; Piñero, Gema; Lorente, Jorge; Ferrer, Miguel; Roger, Sandra; Roig, Carles; Martínez, Francisco J.; De Diego, María; Alonso-Jordá, Pedro; García, Víctor M.; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Vidal, Antonio M. Universidad Politécnica de Valencia (2010)

In this article part of the techniques and developments we are carrying out within the INCO2 group are reported. Results follow the interdisciplinary approach with which we tackle signal processing applications. Chosen ...

Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers

El Mehdi Diouria, Mohammed; Dolz, Manuel F.; Glückc, Olivier; Lefèvre, Laurent; Alonso-Jordá, Pedro; Catalán, Sandra; Mayo, Rafael; Quintana-Orti, Enrique S. Elsevier (2014-06)

Large-scale distributed systems (e.g., datacenters, HPC systems, clouds, large-scale networks, etc.) consume and will consume enormous amounts of energy. Therefore, accurately monitoring the power dissipation and energy ...

Automatic generation of ARM NEON micro‑kernels for matrix multiplication

Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)

General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...

DVFS-control techniques for dense linear algebra operations on multi-core processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Springer (2012-11)

This paper analyzes the impact on power consumption of two DVFS-control strategies when applied to the execution of dense linear algebra operations on multi-core processors. The strategies considered here, prototyped as ...

Efficient and portable Winograd convolutions for multi-core processors

Dolz, Manuel F.; Martínez, Héctor; Castelló, Adrián; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2023-02-12)

We take a step forward towards developing high-performance codes for the convolution operator, based on the Winograd algorithm, that are easy to customise for general-purpose processor architectures. In our approach, ...

Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Verlag (2013-09)

This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear algebra operations, from the point of view of both computational performance and energy consumption. The strategies ...

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra

Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2014-06)

The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate ...

Fast block QR update in digital signal processing

Alventosa, Fran J.; Alonso-Jordá, Pedro; Vidal, Antonio M.; Piñero, Gema; Quintana-Orti, Enrique S. Springer (2019-03)

The processing of digital sound signals often requires the computation of the QR factorization of a rectangular system matrix. However, sometimes, only a given (and probably small) part of the system matrix varies from the ...

Increasing data locality and introducing Level-3 BLAS in the Neville elimination

Alonso-Jordá, Pedro; Cortina Parajón, Raquel; Quintana-Orti, Enrique S.; Ranilla Pastor, José Elsevier (2011-12-01)

In this paper we present two new algorithmic variants to compute the Neville elimination, with and without pivoting, which improve data locality and cast most of the computations in terms of high-performance Level 3 BLAS. ...

Modeling power and energy consumption of dense matrix factorizations on multicore processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2013-10-11)

In this paper, we propose a model for the energy consumption of the concurrent execution of three key dense matrix factorizations, with task parallelism leveraged via the Symmetric Multi-Processing Superscalar (SMPSs) ...

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2014-05)

In this paper we introduce a model for the total energy consumption of the Cholesky factorization on a multicore processor. Our model assumes a task-parallel execution of the factorization process, with concurrency leveraged ...

Performance modeling of the sparse matrix–vector product via convolutional neural networks

Barreda Vayá, Maria; Dolz, Manuel F.; Castaño Álvarez, María Asunción; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2020-02-04)

Modeling the execution time of the sparse matrix–vector multiplication (SpMV) on a current CPU architecture is especially complex due to (i) irregular memory accesses; (ii) indirect memory referencing; and (iii) low ...

The Impact of the Multi-core Revolution on Signal Processing

González, Alberto; BELLOCH, JOSE A.; Martínez, Francisco J.; Alonso-Jordá, Pedro; García, Víctor M.; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Vidal, Antonio M. Universidad Politécnica de Valencia (2010)

This paper analyzes the influence of new multi- core and many-core architectures on Signal Processing. The article covers both the architectural design and the programming models of current general-purpose multi-core ...

Time and energy modeling of high–performance Level-3 BLAS on x86 architectures

Alonso-Jordá, Pedro; Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Elsevier (2015-06)

We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 ...

Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors

Alonso-Jordá, Pedro; Catalán, Sandra; Herrero, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael Elsevier (2018)

We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP) in order to deliver high performance in the reduction to condensed forms for the solution of dense eigenvalue and ...