• closedAccess   A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors 

      Aliaga Estellés, José Ignacio; Alonso-Jordá, Pedro; Badía, José; Chacón, Pablo; Davidovic, Davor; López Blanco, José R.; Quintana-Orti, Enrique S. Elsevier (2016-03-15)
      We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). ...
    • openAccess   A pipeline structure for the block QR update in digital signal processing 

      Dolz, Manuel F.; Alventosa, Fran J.; Alonso-Jordá, Pedro Springer (2019-03)
      There exist problems in the field of digital signal processing, such as filtering of acoustic signals that require processing a large amount of data in real time. The beamforming algorithm, for instance, is a process that ...
    • openAccess   Accelerating multi-channel filtering of audio signal on ARM processors 

      BELLOCH, JOSE A.; Alventosa, Juan J.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.; Vidal, Antonio M. Springer Verlag (2016-03)
    • openAccess   Application of Multi-core and GPU Architectures on Signal Processing: Case Studies 

      González, Alberto; BELLOCH, JOSE A.; Piñero, Gema; Lorente, Jorge; Ferrer, Miguel; Roger, Sandra; Roig, Carles; Martínez, Francisco J.; De Diego, María; Alonso-Jordá, Pedro; García, Víctor M.; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Vidal, Antonio M. Universidad Politécnica de Valencia (2010)
      In this article part of the techniques and developments we are carrying out within the INCO2 group are reported. Results follow the interdisciplinary approach with which we tackle signal processing applications. Chosen ...
    • openAccess   Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers 

      El Mehdi Diouria, Mohammed; Dolz, Manuel F.; Glückc, Olivier; Lefèvre, Laurent; Alonso-Jordá, Pedro; Catalán, Sandra; Mayo, Rafael; Quintana-Orti, Enrique S. Elsevier (2014-06)
      Large-scale distributed systems (e.g., datacenters, HPC systems, clouds, large-scale networks, etc.) consume and will consume enormous amounts of energy. Therefore, accurately monitoring the power dissipation and energy ...
    • openAccess   Automatic generation of ARM NEON micro‑kernels for matrix multiplication 

      Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)
      General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...
    • closedAccess   DVFS-control techniques for dense linear algebra operations on multi-core processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Springer (2012-11)
      This paper analyzes the impact on power consumption of two DVFS-control strategies when applied to the execution of dense linear algebra operations on multi-core processors. The strategies considered here, prototyped as ...
    • openAccess   Efficient and portable Winograd convolutions for multi-core processors 

      Dolz, Manuel F.; Martínez, Héctor; Castelló, Adrián; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2023-02-12)
      We take a step forward towards developing high-performance codes for the convolution operator, based on the Winograd algorithm, that are easy to customise for general-purpose processor architectures. In our approach, ...
    • closedAccess   Energy-efficient execution of dense linear algebra algorithms on multi-core processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Verlag (2013-09)
      This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear algebra operations, from the point of view of both computational performance and energy consumption. The strategies ...
    • closedAccess   Enhancing performance and energy consumption of runtime schedulers for dense linear algebra 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2014-06)
      The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate ...
    • closedAccess   Fast block QR update in digital signal processing 

      Alventosa, Fran J.; Alonso-Jordá, Pedro; Vidal, Antonio M.; Piñero, Gema; Quintana-Orti, Enrique S. Springer (2019-03)
      The processing of digital sound signals often requires the computation of the QR factorization of a rectangular system matrix. However, sometimes, only a given (and probably small) part of the system matrix varies from the ...
    • closedAccess   Increasing data locality and introducing Level-3 BLAS in the Neville elimination 

      Alonso-Jordá, Pedro; Cortina Parajón, Raquel; Quintana-Orti, Enrique S.; Ranilla Pastor, José Elsevier (2011-12-01)
      In this paper we present two new algorithmic variants to compute the Neville elimination, with and without pivoting, which improve data locality and cast most of the computations in terms of high-performance Level 3 BLAS. ...
    • closedAccess   Modeling power and energy consumption of dense matrix factorizations on multicore processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2013-10-11)
      In this paper, we propose a model for the energy consumption of the concurrent execution of three key dense matrix factorizations, with task parallelism leveraged via the Symmetric Multi-Processing Superscalar (SMPSs) ...
    • closedAccess   Modeling power and energy of the task-parallel Cholesky factorization on multicore processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2014-05)
      In this paper we introduce a model for the total energy consumption of the Cholesky factorization on a multicore processor. Our model assumes a task-parallel execution of the factorization process, with concurrency leveraged ...
    • closedAccess   Performance modeling of the sparse matrix–vector product via convolutional neural networks 

      Barreda Vayá, Maria; Dolz, Manuel F.; Castaño Álvarez, María Asunción; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2020-02-04)
      Modeling the execution time of the sparse matrix–vector multiplication (SpMV) on a current CPU architecture is especially complex due to (i) irregular memory accesses; (ii) indirect memory referencing; and (iii) low ...
    • openAccess   The Impact of the Multi-core Revolution on Signal Processing 

      González, Alberto; BELLOCH, JOSE A.; Martínez, Francisco J.; Alonso-Jordá, Pedro; García, Víctor M.; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Vidal, Antonio M. Universidad Politécnica de Valencia (2010)
      This paper analyzes the influence of new multi- core and many-core architectures on Signal Processing. The article covers both the architectural design and the programming models of current general-purpose multi-core ...
    • closedAccess   Time and energy modeling of high–performance Level-3 BLAS on x86 architectures 

      Alonso-Jordá, Pedro; Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Elsevier (2015-06)
      We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 ...
    • closedAccess   Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors 

      Alonso-Jordá, Pedro; Catalán, Sandra; Herrero, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael Elsevier (2018)
      We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP) in order to deliver high performance in the reduction to condensed forms for the solution of dense eigenvalue and ...