• closedAccess   A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures 

      Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)
      Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...
    • openAccess   Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL 

      Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco; Quintana-Orti, Enrique S. Springer (2019-03)
      The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...
    • openAccess   Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures 

      Heavner, Nathan; Igual, Francisco; Quintana-Ortí, Gregorio; MARTINSSON, GUNNAR Association for Computing Machinery (ACM) (2022-06)
      Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently ...
    • openAccess   Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures 

      Quintana-Ortí, Gregorio; Hernando, Fernando; Igual, Francisco Association for Computing Machinery (ACM) (2023-03)
      The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of ...
    • openAccess   Analytical Modeling is Enough for High Performance BLIS 

      Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)
      We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...
    • openAccess   Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors 

      Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer US (2016-09)
      Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. ...
    • openAccess   Automatic generation of ARM NEON micro‑kernels for matrix multiplication 

      Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)
      General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...
    • closedAccess   Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi 

      Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)
      The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...
    • closedAccess   Color and texture analysis using emerging parallel architectures 

      Igual, Francisco; Mayo, Rafael; Hartley, Timothy; Çatalyürek, Ümit V.; Ruiz, Antonio; Ujaldon, Manuel SAGE Publications (2011-11)
      While image texture is effective for use in pattern-recognition and image-analysis algorithms, textural features are time-consuming to calculate on standard CPUs. Therefore, we present novel implementations of textural-feature ...
    • closedAccess   Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures 

      Bientinesi, Paolo; Igual, Francisco; Kressner, Daniel; Petschow, Matthias; Quintana-Orti, Enrique S. Wiley (2011-11-10)
      We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric ...
    • closedAccess   DVFS-control techniques for dense linear algebra operations on multi-core processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Springer (2012-11)
      This paper analyzes the impact on power consumption of two DVFS-control strategies when applied to the execution of dense linear algebra operations on multi-core processors. The strategies considered here, prototyped as ...
    • closedAccess   Enhancing performance and energy consumption of runtime schedulers for dense linear algebra 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2014-06)
      The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate ...
    • closedAccess   Exploiting the capabilities of modern GPUs for dense matrix computations 

      Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio John Wiley & Sons (2009)
      We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. ...
    • closedAccess   Extending OpenMP to Survive the Heterogeneous Multi-Core Era 

      Ayguadé, Eduardo; Badía Sala, Rosa María; Bellens, Pieter; Cabrera, Daniel; Durán, Alejandro; Ferrer, Roger; González, Marc; Igual, Francisco; Jiménez González, Daniel; Labarta Mancho, Jesús; Martinell, Luis; Martorell, Xavier; Mayo, Rafael; Pérez, Josep M.; Planas, Judit; Quintana-Orti, Enrique S. Springer US (2010)
      This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs ...
    • openAccess   Fast Algorithms for the Computation of the Minimum Distance of a Random Linear Code 

      Hernando, Fernando; Igual, Francisco; Quintana-Ortí, Gregorio Association for Computing Machinery (ACM) (2019-06)
      The minimum distance of an error-correcting code is an important concept in information theory. Hence, computing the minimum distance of a code with a minimum computational cost is crucial to many problems in this area. ...
    • openAccess   Hyperspectral Unmixing on Multicore DSPs: Trading Off Performance for Energy 

      Castillo Catalán, María Isabel; Fernández Fernández, Juan Carlos; Igual, Francisco; Plaza, Antonio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo IEEE (2014)
      Wider coverage of observation missions will increase onboard power restrictions while, at the same time, pose higher demands from the perspective of processing time, thus asking for the exploration of novel high-performance ...
    • closedAccess   Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors 

      Catalán, Sandra; Herrero Zaragoza, José R.; Igual, Francisco; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S.; Adeniyi-Jones, Chris Elsevier (2018-03)
      Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS ...
    • openAccess   Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices 

      BELLOCH, JOSE A.; Badía, José; Igual, Francisco; González, Alberto; Quintana-Orti, Enrique S. IEEE (2018-05)
      Numerous signal processing applications are emerging on both mobile and high-performance computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must ...
    • openAccess   Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance 

      BELLOCH, JOSE A.; Badía, José; Igual, Francisco; Cobos, Maximo IEEE (2019-06)
      The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been ...
    • openAccess   Programming parallel dense matrix factorizations with look-ahead and OpenMP 

      Catalán, Sandra; Castelló, Adrián; Igual, Francisco; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer (2019)
      We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded ...