• openAccess   A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting 

      Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael; Van de Geijn, Robert A. IEEE (2019-01)
      We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target ...
    • closedAccess   A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures 

      Quintana-Ortí, Gregorio; Igual, Francisco D.; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)
      Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...
    • openAccess   An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization 

      Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Van de Geijn, Robert A. Springer Verlag (2008)
      We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ...
    • openAccess   Deriving dense linear algebra libraries 

      Bientinesi, Paolo; Gunnels, John A.; Myers, Margaret E.; Quintana-Orti, Enrique S.; Rhodes, Tyler; Van de Geijn, Robert A.; Van Zee, Field G. Springer London (2013-11)
      Starting in the late 1960s computer scientists including Dijkstra and Hoare advocated goal- oriented programming and the formal derivation of algorithms. The chief impediment to realizing this for loop-based programs was ...
    • closedAccess   Families of Algorithms for Reducing a Matrix to Condensed Form 

      Van Zee, Field G.; Van de Geijn, Robert A.; Quintana-Ortí, Gregorio; Elizondo, G. Joseph ACM (2012-11)
      In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The ...
    • openAccess   Householder QR Factorization With Randomization for Column Pivoting (HQRRP) 

      MARTINSSON, GUNNAR; Quintana-Ortí, Gregorio; Heavner, Nathan; Van de Geijn, Robert A. Society for Industrial and Applied Mathematics (2017)
      A fundamental problem when adding column pivoting to the Householder QR fac- torization is that only about half of the computation can be cast in terms of high performing matrix- matrix multiplications, which greatly ...
    • openAccess   Programming matrix algorithms-by-blocks for thread-level parallelism 

      Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Van de Geijn, Robert A.; Van Zee, Field G.; Chan, Ernie Association for Computing Machinery (2009-07)
      With the emergence of thread-level parallelism as the primary means for continued improvement of performance, the programmability issue has reemerged as an obstacle to the use of architectural advances. We argue that ...
    • openAccess   Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance 

      Van Zee, Field G.; Van de Geijn, Robert A.; Quintana-Ortí, Gregorio ACM Digital Library (2014-04)
      We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be- come rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly ...
    • closedAccess   Scheduling algorithms-by-blocks on small clusters 

      Igual, Francisco D.; Quintana-Ortí, Gregorio; Van de Geijn, Robert A. Wiley (2012-03-28)
      The arrival of multicore architectures has generated an interest in reformulating dense matrix computations as algorithms-by-blocks, where submatrices are units of data and computations with those blocks are units of ...
    • closedAccess   The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations 

      Igual, Francisco D.; Chan, Ernie; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio; Van de Geijn, Robert A. Elsevier (2012)
      Parallel accelerators are playing an increasingly important role in scientific computing. However, it is perceived that their weakness nowadays is their reduced “programmability” in comparison with traditional general-purpose ...
    • openAccess   The libflame library for dense matrix computations 

      Van Zee, Field G.; Chan, Ernie; Van de Geijn, Robert A.; Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S. IEEE Computer Society (2009-11)
      Researchers from the Formal Linear Algebra Method Environment (Flame) project have developed new methodologies for analyzing, designing, and implementing linear algebra libraries. These solutions, which have culminated in ...
    • closedAccess   Using desktop computers to solve large-scale dense linear algebra problems 

      Quintana-Orti, Enrique S.; Marqués-Andrés, Mercedes; Quintana-Ortí, Gregorio; Van de Geijn, Robert A. Springer Science+Business Media (2011-11)
      We provide experimental evidence that current desktop computers feature enough computational power to solve large-scale dense linear algebra problems. While the high computational cost of the numerical methods for solving ...