• openAccess   A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting 

      Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael; Van de Geijn, Robert A. IEEE (2019-01)
      We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target ...
    • closedAccess   A complete and efficient CUDA-sharing solution for HPC clusters 

      Peña Monferrer, Antonio J.; Reaño, Carlos; Silla, Federico; Mayo, Rafael; Quintana-Orti, Enrique S.; Duato, José Elsevier (2014)
      In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from ...
    • openAccess   A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems 

      Aliaga Estellés, José Ignacio; Bollhöfer, Matthias; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer (2017-05-28)
      The solution of sparse linear systems of large dimension is a critical step in problems that span a diverse range of applications. For this reason, a number of iterative solvers have been developed, among which ILUPACK ...
    • closedAccess   A factored variant of the Newton iteration for the solution of algebraic Riccati equations via the matrix sign function 

      Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)
      In this paper we introduce a variant of the Newton iteration for the matrix sign function that results in an efficient numerical solver for a certain class of algebraic Riccati equations (AREs). In particular, when the ...
    • closedAccess   A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors 

      Aliaga Estellés, José Ignacio; Alonso-Jordá, Pedro; Badía, José; Chacón, Pablo; Davidovic, Davor; López Blanco, José R.; Quintana-Orti, Enrique S. Elsevier (2016-03-15)
      We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). ...
    • openAccess   A framework for genomic sequencing on clusters of multicore and manycore processors 

      Martínez Pérez, Héctor; Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Tárraga, Joaquín; Medina, Ignacio; Dopazo, Joaquín; Quintana-Orti, Enrique S. Sage (2016-06)
      The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target ...
    • closedAccess   A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms 

      Benner, Peter; Ezzatti, Pablo; Kressner, Daniel; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Elsevier (2011)
      We describe a hybrid Lyapunov solver based on the matrix sign function, where the intensive parts of the computation are accelerated using a graphics processor (GPU) while executing the remaining operations on a general-purpose ...
    • closedAccess   A Parallel Multi-threaded Solver for Symmetric Positive Definite Bordered-Band Linear Systems 

      Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón, Alfredo Springer (2016-04)
      We present a multi-threaded solver for symmetric positive definite linear systems where the coefficient matrix of the problem features a bordered-band non-zero pattern. The algorithms that implement this approach heavily ...
    • closedAccess   A parallel solver for huge dense linear systems  

      Badía, José; Movilla, Jose L.; Climente, Juan I.; Castillo Catalán, María Isabel; Marqués-Andrés, Mercedes; Mayo, Rafael; Quintana-Orti, Enrique S.; Planelles, Josep Elsevier (2011-11)
      HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to ...
    • closedAccess   A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures 

      Ayguadé, Eduardo; Badía Sala, Rosa María; Cabrera, Daniel; Durán, Alejandro; González, Marc; Igual, Francisco; Jiménez González, Daniel; Labarta Mancho, Jesús; Martorell, Xavier; Mayo, Rafael; Pérez, Josep M.; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)
      OpenMP has evolved recently towards expressing unstructured parallelism, targeting the parallelization of a broader range of applications in the current multicore era. Homogeneous multicore architectures from major vendors ...
    • closedAccess   A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures 

      Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)
      Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...
    • openAccess   A simulator to assess energy saving strategies and policies in HPC workloads 

      Quintana-Orti, Enrique S.; Mayo, Rafael; Iserte, Sergio; Fernández Fernández, Juan Carlos; Dolz, Manuel F. Association for Computing Machinery (ACM) (2012-07)
      In recent years power consumption of high performance computing (HPC) clusters has become a growing problem due, e.g., to the economic cost of electricity, the emission of car- bon dioxide (with negative impact on the ...
    • openAccess   Accelerating BST Methods for Model Reduction with Graphics Processors 

      Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer Berlin Heidelberg (2012)
      Model order reduction of dynamical linear time-invariant system appears in many scientific and engineering applications. Numerically reliable SVD-based methods for this task require O(n3) floating-point arithmetic operations, ...
    • openAccess   Accelerating Model Reduction of Large Linear Systems with Graphics Processors 

      Benner, Peter; Ezzatti, Pablo; Kressner, Daniel; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer Berlin Heidelberg (2012)
      Model order reduction of a dynamical linear time-invariant system appears in many applications from science and engineering. Numerically reliable SVD-based methods for this task require in general O(n3) floating-point ...
    • openAccess   Accelerating multi-channel filtering of audio signal on ARM processors 

      BELLOCH, JOSE A.; Alventosa, Juan J.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.; Vidal, Antonio M. Springer Verlag (2016-03)
    • closedAccess   Accelerating the Lyapack library using GPUs 

      Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)
      Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
    • openAccess   Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL 

      Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco; Quintana-Orti, Enrique S. Springer (2019-03)
      The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...
    • closedAccess   Accelerating the task/data-parallel version of ILUPACK’s BiCG in multi-CPU/GPU configurations 

      Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Elsevier (2019)
      ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance on ...
    • openAccess   Accelerating urban scale simulations leveraging local spatial 3D structure 

      Iserte, Sergio; Macias, Aina; Martínez-Cuenca, Raúl; chiva, sergio; Paredes, Roberto; Quintana-Orti, Enrique S. Elsevier (2022-06-15)
      This paper presents a hybrid methodology for accelerating Computational Fluid Dynamics (CFD) simulations intertwining inferences from deep neural networks (DNN). The strategy leverages the local spatial data of the velocity ...
    • closedAccess   Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation 

      Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Anzt, Hartwig; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2020-03)
      We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. ...