Listar por tema "High performance"

Accelerating the Lyapack library using GPUs

Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)

Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Castaño Álvarez, María Asunción; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Verlag (2015)

We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to ...

Analytical Modeling is Enough for High Performance BLIS

Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...

Attaining High Performance in General-Purpose Computations on Current Graphics Processors

Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)

The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate ...

Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi

Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)

The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...

Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures

Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael Elsevier (2017)

Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...

Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures

Catalán, Sandra; Herrero, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael Elsevier (2018)

Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...

Estudio de variables psicológicas en una muestra de golfistas de alto rendimiento

Fontelles Carceller, Javier Universitat Jaume I (2017-07-18)

Psychology, in high performance sport, has become in a very determinant point over the years. In sports field, having a good domain of psychology skills can suppose a big difference between other athletes, and in case of ...

Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)

The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the ...

GLAME@lab: An M-script API for Linear Algebra Operations on Graphics Processors

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-02)

We propose two high-level application programming interfaces (APIs) to use a graphics processing unit (GPU) as a coprocessor for dense linear algebra operations. Combined with an extension of the FLAME API and an ...

Increasing data locality and introducing Level-3 BLAS in the Neville elimination

Alonso-Jordá, Pedro; Cortina Parajón, Raquel; Quintana-Orti, Enrique S.; Ranilla Pastor, José Elsevier (2011-12-01)

In this paper we present two new algorithmic variants to compute the Neville elimination, with and without pivoting, which improve data locality and cast most of the computations in terms of high-performance Level 3 BLAS. ...

Optimising Convolutions for Deep Learning Inference On ARM Cortex-M Processors

Maciá-Lillo, Antonio; Barrachina Mir, Sergio; Fabregat Llueca, German; Dolz, Manuel F. Institute of Electrical and Electronics Engineers Inc. (2024-04-30)

We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 microcontrollers. ...

Out-of-Core Solution of Linear Systems on Graphic Processors

Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Rubio, Rafael; Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-05)

We combine two high-level application programming interfaces to solve large-scale linear systems with the data stored on disk using current graphics processors. The result is a simple yet powerful tool that enables a ...

Parallelizing dense and banded linear algebra libraries using SMPSs

Badía Sala, Rosa María; Herrero, Josep R.; Labarta Mancho, Jesús; Pérez, Josep M.; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio John Wiley (2009-12-25)

The promise of future many-core processors, with hundreds of threads running concurrently, has led the developers of linear algebra libraries to rethink their design in order to extract more parallelism, further exploit ...

Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors

Dolz, Manuel F.; Barrachina Mir, Sergio; Martínez, Héctor; Castelló, Adrián; Maciá, Antonio; Fabregat Llueca, German; Tomás, Andrés E. Springer (2023)

In this work, we assess the performance and energy efciency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) ...

Revisiting the Gauss-Huard Algorithm for the Solution of Linear Systems on Graphics Accelerators

Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón, Alfredo Springer (2016-04-02)

In 1979, P. Huard presented an efficient variant of the Gauss-Jordan elimination for the solution of linear systems. In particular, this alternative algorithm exhibits the same computational cost as the traditional LU-based ...

Solving Dense Linear Systems on Graphics Processors

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-02)

We present several algorithms to compute the solution of a linear system of equations on a GPU, as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We also show how ...

Solving Matrix Equations on Multi-Core and Many-Core Architectures

Benner, Peter; Ezzatti, Pablo; Mena, Hermann; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo MDPI (2013-12)

We address the numerical solution of Lyapunov, algebraic and differential Riccati equations, via the matrix sign function, on platforms equipped with general-purpose multicore processors and, optionally, one or more graphics ...

Solving “Large” Dense Matrix Problems on Multi-Core Processors and GPUs

Marqués-Andrés, Mercedes; Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2009-01)

Few realize that, for large matrices, many dense matrix computations achieve nearly the same performance when the matrices are stored on disk as when they are stored in a very large main memory. Similarly, few realize ...

Time and energy modeling of high–performance Level-3 BLAS on x86 architectures

Alonso-Jordá, Pedro; Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Elsevier (2015-06)

We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 ...