• closedAccess   A customized precision format based on mantissa segmentation for accelerating sparse linear algebra 

      Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Göbel, Fritz; Anzt, Hartwig Wiley (2019)
      In this work, we pursue the idea of radically decoupling the floating point format used for arithmetic operations from the format used to store the data in memory. We complement this idea with a customized precision memory ...
    • closedAccess   Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation 

      Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Anzt, Hartwig; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2020-03)
      We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. ...
    • openAccess   Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers 

      Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Higham, Nicholas J.; Quintana-Orti, Enrique S. Wiley (2019-03-25)
      We propose an adaptive scheme to reduce communication overhead caused by data movement by selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). ...
    • openAccess   Compressed basis GMRES on high-performance graphics processing units 

      Aliaga Estellés, José Ignacio; Anzt, Hartwig; Tomás Domínguez, Andrés Enrique; Quintana-Orti, Enrique S.; Grützmacher, Thomas Sage (2022-08-05)
      Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is ...
    • openAccess   Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units 

      Aliaga Estellés, José Ignacio; Anzt, Hartwig; Grützmacher, Thomas; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique John Wiley and Sons (2021)
      We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the ...
    • closedAccess   Fine-grained bit-flip protection for relaxation methods 

      Anzt, Hartwig; Dongarra, Jack; Quintana-Ortí, Gregorio Elsevier (2019-09)
      Resilience is considered a challenging under-addressed issue that the high performance computing community (HPC) will have to face in order to produce reliable Exascale systems by the beginning of the next decade. As part ...
    • closedAccess   Load-balancing Sparse Matrix Vector Product Kernels on GPUs 

      Anzt, Hartwig; Cojean, Terry; Yen-Chen, Chen; Dongarra, Jack; Flegar, Goran; Nayak, Pratik; Tomov, Stanimire; Tsai, Yuhsiang M.; Wang, Weichung Association for Computing Machinery (ACM) (2020-03)
      Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational ...
    • openAccess   Toward a modular precision ecosystem for high-performance computing 

      Anzt, Hartwig; Flegar, Goran; Grützmacher, Thomas; Quintana-Orti, Enrique S. Sage (2019-05)
      With the memory bandwidth of current computer architectures being significantly slower than the (floating point) arithmetic performance, many scientific computations only leverage a fraction of the computational power in ...
    • closedAccess   Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors 

      Aliaga Estellés, José Ignacio; Anzt, Hartwig; Castillo Catalán, María Isabel; Fernández Fernández, Juan Carlos; León Navarro, Germán; Pérez, Joaquín; Quintana-Orti, Enrique S. Wiley (2015)
      In this paper, we analyze the interactions occurring in the triangle performance-power-energy for the execu- tion of a pivotal numerical algorithm, the iterative conjugate gradient (CG) method, on a diverse collection ...
    • closedAccess   Variable-size batched Gauss–Jordan elimination for block-Jacobi preconditioning on graphics processors 

      Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Quintana-Orti, Enrique S. Elsevier (2019)
      In this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully ...