Listar UJI: Investigación por autoría "7c1c0bf7-9583-4bf7-b497-213e4df60ccb"

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures

Ayguadé, Eduardo; Badía Sala, Rosa María; Cabrera, Daniel; Durán, Alejandro; González, Marc; Igual, Francisco; Jiménez González, Daniel; Labarta Mancho, Jesús; Martorell, Xavier; Mayo, Rafael; Pérez, Josep M.; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)

OpenMP has evolved recently towards expressing unstructured parallelism, targeting the parallelization of a broader range of applications in the current multicore era. Homogeneous multicore architectures from major vendors ...

A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)

Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...

Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL

Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco; Quintana-Orti, Enrique S. Springer (2019-03)

The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...

Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures

Heavner, Nathan; Igual, Francisco; Quintana-Ortí, Gregorio; MARTINSSON, GUNNAR Association for Computing Machinery (ACM) (2022-06)

Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently ...

Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures

Quintana-Ortí, Gregorio; Hernando, Fernando; Igual, Francisco Association for Computing Machinery (ACM) (2023-03)

The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of ...

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Ayguadé, Eduardo; Badía Sala, Rosa María; Igual, Francisco; Labarta Mancho, Jesús; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2009)

While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/power ratio can be attained using specialized ...

Analytical Modeling is Enough for High Performance BLIS

Low, Tze Meng; Igual, Francisco; Smith, Tyler M.; Quintana-Orti, Enrique S. ACM (2016-09)

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning ...

Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S. Springer US (2016-09)

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. ...

Attaining High Performance in General-Purpose Computations on Current Graphics Processors

Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)

The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate ...

Automatic generation of ARM NEON micro‑kernels for matrix multiplication

Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)

General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...

Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi

Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S. Elsevier (2015-08)

The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ...

Color and texture analysis using emerging parallel architectures

Igual, Francisco; Mayo, Rafael; Hartley, Timothy; Çatalyürek, Ümit V.; Ruiz, Antonio; Ujaldon, Manuel SAGE Publications (2011-11)

While image texture is effective for use in pattern-recognition and image-analysis algorithms, textural features are time-consuming to calculate on standard CPUs. Therefore, we present novel implementations of textural-feature ...

Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures

Bientinesi, Paolo; Igual, Francisco; Kressner, Daniel; Petschow, Matthias; Quintana-Orti, Enrique S. Wiley (2011-11-10)

We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric ...

DVFS-control techniques for dense linear algebra operations on multi-core processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Springer (2012-11)

This paper analyzes the impact on power consumption of two DVFS-control strategies when applied to the execution of dense linear algebra operations on multi-core processors. The strategies considered here, prototyped as ...

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra

Alonso-Jordá, Pedro; Dolz, Manuel F.; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2014-06)

The road towards Exascale Computing requires a holistic effort to address three different challenges simultaneously: high performance, energy efficiency, and programmability. The use of runtime task schedulers to orchestrate ...

Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-01)

The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the ...

Exploiting the capabilities of modern GPUs for dense matrix computations

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio John Wiley & Sons (2009)

We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. ...

Extending OpenMP to Survive the Heterogeneous Multi-Core Era

Ayguadé, Eduardo; Badía Sala, Rosa María; Bellens, Pieter; Cabrera, Daniel; Durán, Alejandro; Ferrer, Roger; González, Marc; Igual, Francisco; Jiménez González, Daniel; Labarta Mancho, Jesús; Martinell, Luis; Martorell, Xavier; Mayo, Rafael; Pérez, Josep M.; Planas, Judit; Quintana-Orti, Enrique S. Springer US (2010)

This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs ...

Fast Algorithms for the Computation of the Minimum Distance of a Random Linear Code

Hernando, Fernando; Igual, Francisco; Quintana-Ortí, Gregorio Association for Computing Machinery (ACM) (2019-06)

The minimum distance of an error-correcting code is an important concept in information theory. Hence, computing the minimum distance of a code with a minimum computational cost is crucial to many problems in this area. ...

GLAME@lab: An M-script API for Linear Algebra Operations on Graphics Processors

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S. Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I (2008-02)

We propose two high-level application programming interfaces (APIs) to use a graphics processing unit (GPU) as a coprocessor for dense linear algebra operations. Combined with an extension of the FLAME API and an ...