Listar ICC_Articles por autoría "4eeb2085-f242-47ae-82f0-6c88ea5c4680"

A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting

Catalán, Sandra; Herrero Zaragoza, José R.; Quintana-Orti, Enrique S.; Rodríguez Sánchez, Rafael; Van de Geijn, Robert A. IEEE (2019-01)

We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target ...

A complete and efficient CUDA-sharing solution for HPC clusters

Peña Monferrer, Antonio J.; Reaño, Carlos; Silla, Federico; Mayo, Rafael; Quintana-Orti, Enrique S.; Duato, José Elsevier (2014)

In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from ...

A factored variant of the Newton iteration for the solution of algebraic Riccati equations via the matrix sign function

Benner, Peter; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)

In this paper we introduce a variant of the Newton iteration for the matrix sign function that results in an efficient numerical solver for a certain class of algebraic Riccati equations (AREs). In particular, when the ...

A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

Aliaga Estellés, José Ignacio; Alonso-Jordá, Pedro; Badía, José; Chacón, Pablo; Davidovic, Davor; López Blanco, José R.; Quintana-Orti, Enrique S. Elsevier (2016-03-15)

We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). ...

A framework for genomic sequencing on clusters of multicore and manycore processors

Martínez Pérez, Héctor; Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Tárraga, Joaquín; Medina, Ignacio; Dopazo, Joaquín; Quintana-Orti, Enrique S. Sage (2016-06)

The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target ...

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU–GPU platforms

Benner, Peter; Ezzatti, Pablo; Kressner, Daniel; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Elsevier (2011)

We describe a hybrid Lyapunov solver based on the matrix sign function, where the intensive parts of the computation are accelerated using a graphics processor (GPU) while executing the remaining operations on a general-purpose ...

A parallel solver for huge dense linear systems

Badía, José; Movilla, Jose L.; Climente, Juan I.; Castillo Catalán, María Isabel; Marqués-Andrés, Mercedes; Mayo, Rafael; Quintana-Orti, Enrique S.; Planelles, Josep Elsevier (2011-11)

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to ...

A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

Quintana-Ortí, Gregorio; Igual, Francisco; Marqués-Andrés, Mercedes; Quintana-Orti, Enrique S.; Van de Geijn, Robert A. ACM (2012-08)

Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how ...

A simulator to assess energy saving strategies and policies in HPC workloads

Quintana-Orti, Enrique S.; Mayo, Rafael; Iserte, Sergio; Fernández Fernández, Juan Carlos; Dolz, Manuel F. Association for Computing Machinery (ACM) (2012-07)

In recent years power consumption of high performance computing (HPC) clusters has become a growing problem due, e.g., to the economic cost of electricity, the emission of car- bon dioxide (with negative impact on the ...

Accelerating multi-channel filtering of audio signal on ARM processors

BELLOCH, JOSE A.; Alventosa, Juan J.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.; Vidal, Antonio M. Springer Verlag (2016-03)

Accelerating the Lyapack library using GPUs

Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo Springer (2013)

Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...

Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL

Badía, José; BELLOCH, JOSE A.; Cobos, Maximo; Igual, Francisco; Quintana-Orti, Enrique S. Springer (2019-03)

The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a ...

Accelerating the task/data-parallel version of ILUPACK’s BiCG in multi-CPU/GPU configurations

Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Elsevier (2019)

ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance on ...

Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation

Grützmacher, Thomas; Cojean, Terry; Flegar, Goran; Anzt, Hartwig; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2020-03)

We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. ...

Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

Aliaga Estellés, José Ignacio; Barreda Vayá, Maria; Castaño Álvarez, María Asunción; Dolz, Manuel F.; Quintana-Orti, Enrique S. Springer Verlag (2015)

We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to ...

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

Flegar, Goran; Anzt, Hartwig; Cojean, Terry; Quintana-Orti, Enrique S. Association for Computing Machinery (ACM) (2021-04)

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in ...

Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers

Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Higham, Nicholas J.; Quintana-Orti, Enrique S. Wiley (2019-03-25)

We propose an adaptive scheme to reduce communication overhead caused by data movement by selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). ...

An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

Quintana-Ortí, Gregorio; Quintana-Orti, Enrique S.; Remón Gómez, Alfredo; Van de Geijn, Robert A. Springer Verlag (2008)

We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large ...

An efficient GPU version of the preconditioned GMRES method

Aliaga Estellés, José Ignacio; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S. Springer (2019-03)

In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative ...

Analysis of Threading Libraries for High Performance Computing

Castelló, Adrián; Mayo, Rafael; Seo, Sangmin; Balaji, Pavan; Quintana-Orti, Enrique S.; Peña Monferrer, Antonio J. IEEE (2020-01-30)

With the appearance of multi-many core machines, applications and runtime systems evolved in order to exploit the new on-node concurrency that brought new software paradigms. POSIX threads (Pthreads) was widely-adopted for ...