• openAccess   A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting 

      Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael; Van de Geijn, Robert A. IEEE (2019-01)
      We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target ...
    • closedAccess   A complete and efficient CUDA-sharing solution for HPC clusters 

      Peña Monferrer, Antonio J.; Reaño, Carlos; Silla, Federico; Mayo, Rafael; Quintana-Orti, Enrique S.; Duato, José Elsevier (2014)
      In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from ...
    • closedAccess   A factored variant of the Newton iteration for the solution of algebraic Riccati equations via the matrix sign function 

      Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)
      In this paper we introduce a variant of the Newton iteration for the matrix sign function that results in an efficient numerical solver for a certain class of algebraic Riccati equations (AREs). In particular, when the ...
    • closedAccess   A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors 

      Aliaga Estellés, José Ignacio; Alonso-Jordá, Pedro; Badía, José; Chacón, Pablo; Davidovic, Davor; López Blanco, José R.; Quintana-Orti, Enrique S. Elsevier (2016-03-15)
      We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). ...
    • openAccess   A framework for genomic sequencing on clusters of multicore and manycore processors 

      Martínez Pérez, Héctor; Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Tárraga, Joaquín; Medina, Ignacio; Dopazo, Joaquín; Quintana-Orti, Enrique S. Sage (2016-06)
      The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target ...
    • closedAccess   A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms 

      Benner, Peter; Ezzatti, Pablo; Kressner, Daniel; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Elsevier (2011)
      We describe a hybrid Lyapunov solver based on the matrix sign function, where the intensive parts of the computation are accelerated using a graphics processor (GPU) while executing the remaining operations on a general-purpose ...
    • closedAccess   A parallel solver for huge dense linear systems  

      Badía, José; Movilla, Jose L.; Climente, Juan I.; Castillo Catalán, María Isabel; Marqués-Andrés, Mercedes; Mayo, Rafael; Quintana-Orti, Enrique S.; Planelles, Josep Elsevier (2011-11)
      HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to ...
    • closedAccess   A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures 

      Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)
      Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...
    • openAccess   A simulator to assess energy saving strategies and policies in HPC workloads 

      Quintana-Orti, Enrique S.; Mayo, Rafael; Iserte, Sergio; Fernández Fernández, Juan Carlos; Dolz, Manuel F. Association for Computing Machinery (ACM) (2012-07)
      In recent years power consumption of high performance computing (HPC) clusters has become a growing problem due, e.g., to the economic cost of electricity, the emission of car- bon dioxide (with negative impact on the ...
    • openAccess   Accelerating multi-channel filtering of audio signal on ARM processors 

      BELLOCH, JOSE A.; Alventosa, Juan J.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.; Vidal, Antonio M. Springer Verlag (2016-03)
    • closedAccess   Accelerating the Lyapack library using GPUs 

      Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)
      Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
    • openAccess   Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL 

      Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco; Quintana-Orti, Enrique S. Springer (2019-03)
      The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...
    • closedAccess   Accelerating the task/data-parallel version of ILUPACK’s BiCG in multi-CPU/GPU configurations 

      Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Elsevier (2019)
      ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance on ...
    • closedAccess   Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation 

      Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Anzt, Hartwig; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2020-03)
      We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. ...
    • openAccess   Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers 

      Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Castaño Álvarez, María Asunción; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Verlag (2015)
      We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to ...
    • closedAccess   Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software 

      Flegar, Goran; Anzt, Hartwig; Cojean, Terry; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2021-04)
      The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in ...
    • openAccess   Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers 

      Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Higham, Nicholas J.; Quintana-Orti, Enrique S. Wiley (2019-03-25)
      We propose an adaptive scheme to reduce communication overhead caused by data movement by selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). ...
    • openAccess   An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization 

      Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Van de Geijn, Robert A. Springer Verlag (2008)
      We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ...
    • closedAccess   An efficient GPU version of the preconditioned GMRES method 

      Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer (2019-03)
      In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative ...
    • openAccess   Analysis of Threading Libraries for High Performance Computing 

      Castelló, Adrián; Mayo, Rafael; Seo, Sangmin; Balaji, Pavan; Quintana-Orti, Enrique S.; Peña Monferrer, Antonio J. IEEE (2020-01-30)
      With the appearance of multi-many core machines, applications and runtime systems evolved in order to exploit the new on-node concurrency that brought new software paradigms. POSIX threads (Pthreads) was widely-adopted for ...