• closedAccess   Accelerating the Lyapack library using GPUs 

      Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)
      Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
    • openAccess   Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers 

      Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Castaño Álvarez, María Asunción; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Verlag (2015)
      We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to ...
    • openAccess   Analytical Modeling is Enough for High Performance BLIS 

      Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)
      We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...
    • openAccess   Attaining High Performance in General-Purpose Computations on Current Graphics Processors 

      Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)
      The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate ...
    • closedAccess   Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi 

      Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)
      The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...
    • openAccess   Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures 

      Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael Elsevier (2017)
      Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
    • closedAccess   Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures 

      Catalán, Sandra; Herrero, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael Elsevier (2018)
      Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
    • openAccess   Estudio de variables psicológicas en una muestra de golfistas de alto rendimiento 

      Fontelles Carceller, Javier Universitat Jaume I (2017-07-18)
      Psychology, in high performance sport, has become in a very determinant point over the years. In sports field, having a good domain of psychology skills can suppose a big difference between other athletes, and in case of ...
    • openAccess   Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)
      The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the ...
    • openAccess   GLAME@lab: An M-script API for Linear Algebra Operations on Graphics Processors 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-02)
      We propose two high-level application programming interfaces (APIs) to use a graphics processing unit (GPU) as a coprocessor for dense linear algebra operations. Combined with an extension of the FLAME API and an ...
    • closedAccess   Increasing data locality and introducing Level-3 BLAS in the Neville elimination 

      Alonso-Jordá, Pedro; Cortina Parajón, Raquel; Quintana-Orti, Enrique S.; Ranilla Pastor, José Elsevier (2011-12-01)
      In this paper we present two new algorithmic variants to compute the Neville elimination, with and without pivoting, which improve data locality and cast most of the computations in terms of high-performance Level 3 BLAS. ...
    • openAccess   Optimising Convolutions for Deep Learning Inference On ARM Cortex-M Processors 

      Maciá-Lillo, Antonio; Barrachina Mir, Sergio; Fabregat Llueca, German; Dolz, Manuel F. Institute of Electrical and Electronics Engineers Inc. (2024-04-30)
      We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 microcontrollers. ...
    • openAccess   Out-of-Core Solution of Linear Systems on Graphic Processors 

      Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Rubio, Rafael; Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-05)
      We combine two high-level application programming interfaces to solve large-scale linear systems with the data stored on disk using current graphics processors. The result is a simple yet powerful tool that enables a ...
    • openAccess   Parallelizing dense and banded linear algebra libraries using SMPSs 

      Badía Sala, Rosa María; Herrero, Josep R.; Labarta Mancho, Jesús; Pérez, Josep M.; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio John Wiley (2009-12-25)
      The promise of future many-core processors, with hundreds of threads running concurrently, has led the developers of linear algebra libraries to rethink their design in order to extract more parallelism, further exploit ...
    • openAccess   Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors 

      Dolz, Manuel F.; Barrachina Mir, Sergio; Martínez, Héctor; Castelló, Adrián; Maciá, Antonio; Fabregat Llueca, German; Tomás, Andrés E. Springer (2023)
      In this work, we assess the performance and energy efciency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) ...
    • closedAccess   Revisiting the Gauss-Huard Algorithm for the Solution of Linear Systems on Graphics Accelerators 

      Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón, Alfredo Springer (2016-04-02)
      In 1979, P. Huard presented an efficient variant of the Gauss-Jordan elimination for the solution of linear systems. In particular, this alternative algorithm exhibits the same computational cost as the traditional LU-based ...
    • openAccess   Solving Dense Linear Systems on Graphics Processors 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-02)
      We present several algorithms to compute the solution of a linear system of equations on a GPU, as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We also show how ...
    • openAccess   Solving Matrix Equations on Multi-Core and Many-Core Architectures 

      Benner, Peter; Ezzatti, Pablo; Mena, Hermann; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo MDPI (2013-12)
      We address the numerical solution of Lyapunov, algebraic and differential Riccati equations, via the matrix sign function, on platforms equipped with general-purpose multicore processors and, optionally, one or more graphics ...
    • openAccess   Solving “Large” Dense Matrix Problems on Multi-Core Processors and GPUs 

      Marqués-Andrés, Mercedes; Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2009-01)
      Few realize that, for large matrices, many dense matrix computations achieve nearly the same performance when the matrices are stored on disk as when they are stored in a very large main memory. Similarly, few realize ...
    • closedAccess   Time and energy modeling of high–performance Level-3 BLAS on x86 architectures 

      Alonso-Jordá, Pedro; Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Elsevier (2015-06)
      We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 ...