• closedAccess   A customized precision format based on mantissa segmentation for accelerating sparse linear algebra 

      Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Göbel, Fritz; Anzt, Hartwig Wiley (2019)
      In this work, we pursue the idea of radically decoupling the floating point format used for arithmetic operations from the format used to store the data in memory. We complement this idea with a customized precision memory ...
    • openAccess   Accelerating BST Methods for Model Reduction with Graphics Processors 

      Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer Berlin Heidelberg (2012)
      Model order reduction of dynamical linear time-invariant system appears in many scientific and engineering applications. Numerically reliable SVD-based methods for this task require O(n3) floating-point arithmetic operations, ...
    • openAccess   Accelerating Model Reduction of Large Linear Systems with Graphics Processors 

      Benner, Peter; Ezzatti, Pablo; Kressner, Daniel; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer Berlin Heidelberg (2012)
      Model order reduction of a dynamical linear time-invariant system appears in many applications from science and engineering. Numerically reliable SVD-based methods for this task require in general O(n3) floating-point ...
    • closedAccess   Accelerating the Lyapack library using GPUs 

      Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)
      Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
    • openAccess   Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL 

      Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco D.; Quintana-Orti, Enrique S. Springer (2019-03)
      The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...
    • closedAccess   Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation 

      Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Anzt, Hartwig; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2020-03)
      We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. ...
    • closedAccess   An efficient GPU version of the preconditioned GMRES method 

      Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer (2019-03)
      In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative ...
    • openAccess   Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs 

      Aliaga Estellés, José Ignacio; Anzt, Hartwig; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique; Tsai, Yuhsiang M. Springer (2021)
      We contribute to the optimization of the sparse matrix-vector product on graphics processing units by introducing a variant of the coordinate sparse matrix layout that compresses the integer representation of the matrix ...
    • closedAccess   Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures 

      Bientinesi, Paolo; Igual, Francisco D.; Kressner, Daniel; Petschow, Matthias; Quintana-Orti, Enrique S. Wiley (2011-11-10)
      We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric ...
    • openAccess   Consumo energético de métodos iterativos para sistemas dispersos en procesadores gráficos 

      Pérez Badenes, Joaquín Universitat Jaume I (2016-12-09)
      La resolución de sistemas de ecuaciones lineales dispersos de gran dimensión es una de las operaciones más comunes en aplicaciones científicas y de ingeniería. El aumento de sus tamaños propicia el desarrollo de técnicas ...
    • openAccess   Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models 

      Castelló, Adrián; Pena, Antonio J.; Mayo, Rafael; Planas, Judit; Quintana-Orti, Enrique S.; Balaji, Pavan Springer (2016-06-21)
      Directive-based programming models, such as OpenMP, OpenACC, and OmpSs, enable users to accelerate applications by using coprocessors with little effort. These devices offer significant computing power, but their use can ...
    • openAccess   Extending lyapack for the solution of band Lyapunov equations on hybrid CPU–GPU platforms 

      Benner, Peter; Remón Gómez, Alfredo; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer Verlag (2015)
      The solution of large-scale Lyapunov equations is an important tool for the solution of several engineering problems arising in optimal control and model order reduction. In this work, we investigate the case when the ...
    • openAccess   FaST-LMM for Two-Way Epistasis Tests on High-Performance Clusters 

      Martínez Pérez, Héctor; Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Quintana-Orti, Enrique S.; Rambla, Jordi; Farré, Xavier; Navarro, Arcadi Mary Ann Liebert (2018-08)
      We introduce a version of the epistasis test in FaST-LMM for clusters of multithreaded processors. This new software maintains the sensitivity of the original FaST-LMM while delivering acceleration that is close to linear ...
    • openAccess   Hierarchical approach for deriving a reproducible unblocked LU factorization 

      Iakymchuk, Roman; Graillat, Stef; Defour, David; Quintana-Orti, Enrique S. Sage (2019-03-17)
      We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for ...
    • closedAccess   Load-balancing Sparse Matrix Vector Product Kernels on GPUs 

      Anzt, Hartwig; Cojean, Terry; Yen-Chen, Chen; Dongarra, Jack; Flegar, Goran; Nayak, Pratik; Tomov, Stanimire; Tsai, Yuhsiang M.; Wang, Weichung Association for Computing Machinery (ACM) (2020-03)
      Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational ...
    • closedAccess   Out-of-core macromolecular simulations on multithreaded architectures 

      Aliaga Estellés, José Ignacio; Badía, José; Castillo Catalán, María Isabel; Davidovic, Davor; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2015)
      We address the solution of large-scale eigenvalue problems that appear in the motion simulation o f com- plex macromolecules on multithreaded platforms, consisting of multicore processors and possibly a graphics processor ...
    • openAccess   Solving Matrix Equations on Multi-Core and Many-Core Architectures 

      Benner, Peter; Ezzatti, Pablo; Mena, Hermann; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo MDPI (2013-12)
      We address the numerical solution of Lyapunov, algebraic and differential Riccati equations, via the matrix sign function, on platforms equipped with general-purpose multicore processors and, optionally, one or more graphics ...
    • openAccess   Solving “Large” Dense Matrix Problems on Multi-Core Processors and GPUs 

      Marqués-Andrés, Mercedes; Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2009-01)
      Few realize that, for large matrices, many dense matrix computations achieve nearly the same performance when the matrices are stored on disk as when they are stored in a very large main memory. Similarly, few realize ...
    • openAccess   Toward a modular precision ecosystem for high-performance computing 

      Anzt, Hartwig; Flegar, Goran; Grützmacher, Thomas; Quintana-Orti, Enrique S. Sage (2019-05)
      With the memory bandwidth of current computer architectures being significantly slower than the (floating point) arithmetic performance, many scientific computations only leverage a fraction of the computational power in ...
    • openAccess   Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction 

      Benner, Peter; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo © Springer International Publishing AG (2015-12)
      Linear algebra operations arise in a myriad of scientific and engineering applications and, therefore, their optimization is targeted by a significant number of high performance computing (HPC) research efforts. In particular, ...