• closedAccess   A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures 

      Ayguadé, Eduardo; Badía Sala, Rosa María; Cabrera, Daniel; Durán, Alejandro; González, Marc; Igual, Francisco; Jiménez González, Daniel; Labarta Mancho, Jesús; Martorell, Xavier; Mayo, Rafael; Pérez, Josep M.; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)
      OpenMP has evolved recently towards expressing unstructured parallelism, targeting the parallelization of a broader range of applications in the current multicore era. Homogeneous multicore architectures from major vendors ...
    • closedAccess   A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures 

      Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)
      Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...
    • openAccess   Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL 

      Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco; Quintana-Orti, Enrique S. Springer (2019-03)
      The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...
    • openAccess   Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures 

      Heavner, Nathan; Igual, Francisco; Quintana-Ortí, Gregorio; MARTINSSON, GUNNAR Association for Computing Machinery (ACM) (2022-06)
      Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently ...
    • openAccess   Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures 

      Quintana-Ortí, Gregorio; Hernando, Fernando; Igual, Francisco Association for Computing Machinery (ACM) (2023-03)
      The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of ...
    • openAccess   An Extension of the StarSs Programming Model for Platforms with Multiple GPUs 

      Ayguadé, Eduardo; Badía Sala, Rosa María; Igual, Francisco; Labarta Mancho, Jesús; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)
      While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/power ratio can be attained using specialized ...
    • openAccess   Analytical Modeling is Enough for High Performance BLIS 

      Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)
      We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...
    • openAccess   Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors 

      Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer US (2016-09)
      Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. ...
    • openAccess   Attaining High Performance in General-Purpose Computations on Current Graphics Processors 

      Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)
      The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate ...
    • openAccess   Automatic generation of ARM NEON micro‑kernels for matrix multiplication 

      Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)
      General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...
    • closedAccess   Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi 

      Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)
      The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...
    • closedAccess   Color and texture analysis using emerging parallel architectures 

      Igual, Francisco; Mayo, Rafael; Hartley, Timothy; Çatalyürek, Ümit V.; Ruiz, Antonio; Ujaldon, Manuel SAGE Publications (2011-11)
      While image texture is effective for use in pattern-recognition and image-analysis algorithms, textural features are time-consuming to calculate on standard CPUs. Therefore, we present novel implementations of textural-feature ...
    • closedAccess   Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures 

      Bientinesi, Paolo; Igual, Francisco; Kressner, Daniel; Petschow, Matthias; Quintana-Orti, Enrique S. Wiley (2011-11-10)
      We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric ...
    • closedAccess   DVFS-control techniques for dense linear algebra operations on multi-core processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Springer (2012-11)
      This paper analyzes the impact on power consumption of two DVFS-control strategies when applied to the execution of dense linear algebra operations on multi-core processors. The strategies considered here, prototyped as ...
    • closedAccess   Enhancing performance and energy consumption of runtime schedulers for dense linear algebra 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2014-06)
      The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate ...
    • openAccess   Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)
      The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the ...
    • closedAccess   Exploiting the capabilities of modern GPUs for dense matrix computations 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio John Wiley & Sons (2009)
      We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. ...
    • closedAccess   Extending OpenMP to Survive the Heterogeneous Multi-Core Era 

      Ayguadé, Eduardo; Badía Sala, Rosa María; Bellens, Pieter; Cabrera, Daniel; Durán, Alejandro; Ferrer, Roger; González, Marc; Igual, Francisco; Jiménez González, Daniel; Labarta Mancho, Jesús; Martinell, Luis; Martorell, Xavier; Mayo, Rafael; Pérez, Josep M.; Planas, Judit; Quintana-Orti, Enrique S. Springer US (2010)
      This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs ...
    • openAccess   Fast Algorithms for the Computation of the Minimum Distance of a Random Linear Code 

      Hernando, Fernando; Igual, Francisco; Quintana-Ortí, Gregorio Association for Computing Machinery (ACM) (2019-06)
      The minimum distance of an error-correcting code is an important concept in information theory. Hence, computing the minimum distance of a code with a minimum computational cost is crucial to many problems in this area. ...
    • openAccess   GLAME@lab: An M-script API for Linear Algebra Operations on Graphics Processors 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-02)
      We propose two high-level application programming interfaces (APIs) to use a graphics processing unit (GPU) as a coprocessor for dense linear algebra operations. Combined with an extension of the FLAME API and an ...