Listar por autoría "4eeb2085-f242-47ae-82f0-6c88ea5c4680"

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Castaño Álvarez, María Asunción; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Verlag (2015)

We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to ...

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

Flegar, Goran; Anzt, Hartwig; Cojean, Terry; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2021-04)

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in ...

Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers

Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Higham, Nicholas J.; Quintana-Orti, Enrique S. Wiley (2019-03-25)

We propose an adaptive scheme to reduce communication overhead caused by data movement by selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). ...

Adaptive precision solvers for sparse linear systems

Anzt, Hartwig; Dongarra, Jack; Quintana-Orti, Enrique S. ACM (2015)

We formulate an implementation of a Jacobi iterative solver for sparse linear systems that iterates the distinct components of the solution with different precision in terms of mantissa length. Starting with very low ...

An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Van de Geijn, Robert A. Springer Verlag (2008)

We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ...

An efficient GPU version of the preconditioned GMRES method

Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer (2019-03)

In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative ...

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Ayguadé, Eduardo; Badía Sala, Rosa María; Igual, Francisco; Labarta Mancho, Jesús; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)

While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/power ratio can be attained using specialized ...

Analysis of Threading Libraries for High Performance Computing

Castelló, Adrián; Mayo, Rafael; Seo, Sangmin; Balaji, Pavan; Quintana-Orti, Enrique S.; Peña Monferrer, Antonio J. IEEE (2020-01-30)

With the appearance of multi-many core machines, applications and runtime systems evolved in order to exploit the new on-node concurrency that brought new software paradigms. POSIX threads (Pthreads) was widely-adopted for ...

Analytical Modeling is Enough for High Performance BLIS

Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

Castelló, Adrián; Catalán Carbó, Mar; Dolz, Manuel F.; Quintana-Orti, Enrique S.; Duato, José Springer (2022-01-10)

For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance ...

Application of Multi-core and GPU Architectures on Signal Processing: Case Studies

González, Alberto; BELLOCH, JOSE A.; Piñero, Gema; Lorente, Jorge; Ferrer, Miguel; Roger, Sandra; Roig, Carles; Martínez, Francisco J.; De Diego, María; Alonso-Jordá, Pedro; García, Víctor M.; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Vidal, Antonio M. Universidad Politécnica de Valencia (2010)

In this article part of the techniques and developments we are carrying out within the INCO2 group are reported. Results follow the interdisciplinary approach with which we tackle signal processing applications. Chosen ...

Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer US (2016-09)

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. ...

Are our dense linear algebra libraries energy-friendly?. Time–power–energy trade-offs in BLAS and LAPACK

Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2015-05)

In this paper we conduct a detailed analysis of the sources of power dissipation and energy consumption during the execution of current dense linear algebra kernels on multicore processors, binding these two metrics together ...

Arquitecture-aware optimization of an hevc decoder on asymmetric multicore processors

Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer Verlag (2016-05)

Low-power asymmetric multicore processors (AMPs) have attracted considerable attention due to their appealing performance/power ratio for energy-constrained environments. However, these processors pose a significant ...

Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers

El Mehdi Diouria, Mohammed; Dolz, Manuel F.; Glückc, Olivier; Lefèvre, Laurent; Alonso-Jordá, Pedro; Catalán, Sandra; Mayo, Rafael; Quintana-Orti, Enrique S. Elsevier (2014-06)

Large-scale distributed systems (e.g., datacenters, HPC systems, clouds, large-scale networks, etc.) consume and will consume enormous amounts of energy. Therefore, accurately monitoring the power dissipation and energy ...

Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems

Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Dolz, Manuel F.; Martín Huertas, Alberto F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer US (2014)

We investigate the benefits that an energyaware implementation of the runtime in charge of the concurrent execution of ILUPACK —a sophisticated preconditioned iterative solver for sparse linear systems— produces on the ...

Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing

Sánchez, S.; León Navarro, Germán; Plaza, Antonio; Quintana-Orti, Enrique S. IEEE (2014)

Remotely sensed hyperspectral imaging missions are often limited by onboard power restrictions while, simultaneously, require high computing power in order to address applications with relevant constraints in terms of ...

Attaining High Performance in General-Purpose Computations on Current Graphics Processors

Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)

The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate ...

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs

Aliaga Estellés, José Ignacio; Anzt, Hartwig; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique; Tsai, Yuhsiang M. Springer (2021)

We contribute to the optimization of the sparse matrix-vector product on graphics processing units by introducing a variant of the coordinate sparse matrix layout that compresses the integer representation of the matrix ...

Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi

Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)

The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...