• openAccess   Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers 

      Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Castaño Álvarez, María Asunción; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Verlag (2015)
      We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to ...
    • closedAccess   Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software 

      Flegar, Goran; Anzt, Hartwig; Cojean, Terry; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2021-04)
      The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in ...
    • openAccess   Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers 

      Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Higham, Nicholas J.; Quintana-Orti, Enrique S. Wiley (2019-03-25)
      We propose an adaptive scheme to reduce communication overhead caused by data movement by selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). ...
    • openAccess   Adaptive precision solvers for sparse linear systems 

      Anzt, Hartwig; Dongarra, Jack; Quintana-Orti, Enrique S. ACM (2015)
      We formulate an implementation of a Jacobi iterative solver for sparse linear systems that iterates the distinct components of the solution with different precision in terms of mantissa length. Starting with very low ...
    • openAccess   An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization 

      Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Van de Geijn, Robert A. Springer Verlag (2008)
      We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ...
    • closedAccess   An efficient GPU version of the preconditioned GMRES method 

      Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer (2019-03)
      In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative ...
    • openAccess   An Extension of the StarSs Programming Model for Platforms with Multiple GPUs 

      Ayguadé, Eduardo; Badía Sala, Rosa María; Igual, Francisco; Labarta Mancho, Jesús; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)
      While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/power ratio can be attained using specialized ...
    • openAccess   Analysis of Threading Libraries for High Performance Computing 

      Castelló, Adrián; Mayo, Rafael; Seo, Sangmin; Balaji, Pavan; Quintana-Orti, Enrique S.; Peña Monferrer, Antonio J. IEEE (2020-01-30)
      With the appearance of multi-many core machines, applications and runtime systems evolved in order to exploit the new on-node concurrency that brought new software paradigms. POSIX threads (Pthreads) was widely-adopted for ...
    • openAccess   Analytical Modeling is Enough for High Performance BLIS 

      Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)
      We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...
    • openAccess   Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks 

      Castelló, Adrián; Catalán Carbó, Mar; Dolz, Manuel F.; Quintana-Orti, Enrique S.; Duato, José Springer (2022-01-10)
      For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance ...
    • openAccess   Application of Multi-core and GPU Architectures on Signal Processing: Case Studies 

      González, Alberto; BELLOCH, JOSE A.; Piñero, Gema; Lorente, Jorge; Ferrer, Miguel; Roger, Sandra; Roig, Carles; Martínez, Francisco J.; De Diego, María; Alonso-Jordá, Pedro; García, Víctor M.; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Vidal, Antonio M. Universidad Politécnica de Valencia (2010)
      In this article part of the techniques and developments we are carrying out within the INCO2 group are reported. Results follow the interdisciplinary approach with which we tackle signal processing applications. Chosen ...
    • openAccess   Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors 

      Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer US (2016-09)
      Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. ...
    • closedAccess   Are our dense linear algebra libraries energy-friendly?. Time–power–energy trade-offs in BLAS and LAPACK 

      Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2015-05)
      In this paper we conduct a detailed analysis of the sources of power dissipation and energy consumption during the execution of current dense linear algebra kernels on multicore processors, binding these two metrics together ...
    • closedAccess   Arquitecture-aware optimization of an hevc decoder on asymmetric multicore processors 

      Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer Verlag (2016-05)
      Low-power asymmetric multicore processors (AMPs) have attracted considerable attention due to their appealing performance/power ratio for energy-constrained environments. However, these processors pose a significant ...
    • openAccess   Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers 

      El Mehdi Diouria, Mohammed; Dolz, Manuel F.; Glückc, Olivier; Lefèvre, Laurent; Alonso-Jordá, Pedro; Catalán, Sandra; Mayo, Rafael; Quintana-Orti, Enrique S. Elsevier (2014-06)
      Large-scale distributed systems (e.g., datacenters, HPC systems, clouds, large-scale networks, etc.) consume and will consume enormous amounts of energy. Therefore, accurately monitoring the power dissipation and energy ...
    • openAccess   Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems 

      Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Dolz, Manuel F.; Martín Huertas, Alberto F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer US (2014)
      We investigate the benefits that an energyaware implementation of the runtime in charge of the concurrent execution of ILUPACK —a sophisticated preconditioned iterative solver for sparse linear systems— produces on the ...
    • closedAccess   Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing 

      Sánchez, S.; León Navarro, Germán; Plaza, Antonio; Quintana-Orti, Enrique S. IEEE (2014)
      Remotely sensed hyperspectral imaging missions are often limited by onboard power restrictions while, simultaneously, require high computing power in order to address applications with relevant constraints in terms of ...
    • openAccess   Attaining High Performance in General-Purpose Computations on Current Graphics Processors 

      Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)
      The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate ...
    • openAccess   Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs 

      Aliaga Estellés, José Ignacio; Anzt, Hartwig; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique; Tsai, Yuhsiang M. Springer (2021)
      We contribute to the optimization of the sparse matrix-vector product on graphics processing units by introducing a variant of the coordinate sparse matrix layout that compresses the integer representation of the matrix ...
    • closedAccess   Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi 

      Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)
      The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...