Cerca
Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models
(Springer, 2016-06-21)
Directive-based programming models, such as OpenMP, OpenACC, and OmpSs, enable users to accelerate applications by using coprocessors with little effort. These devices offer significant computing power, but their use can ...
Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units
(John Wiley and Sons, 2021)
We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the ...
FaST-LMM for Two-Way Epistasis Tests on High-Performance Clusters
(Mary Ann Liebert, 2018-08)
We introduce a version of the epistasis test in FaST-LMM for clusters of multithreaded processors. This new software maintains the sensitivity of the original FaST-LMM while delivering acceleration that is close to linear ...
Noise estimation for hyperspectral subspace identification on FPGAs
(Springer, 2019-05)
We present a reliable and efficient FPGA implementation of a procedure for the computation of the noise estimation matrix, a key stage for subspace identification of hyperspectral images. Our hardware realization is based ...
Energy Balance between Voltage-Frequency Scaling and Resilience for Linear Algebra Routines on Low-Power Multicore Architectures
(Elsevier, 2017)
Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate ...
Analysis of Threading Libraries for High Performance Computing
(IEEE, 2020-01-30)
With the appearance of multi-many core machines, applications and runtime systems evolved in order to exploit the new on-node concurrency that brought new software paradigms. POSIX threads (Pthreads) was widely-adopted for ...
Communication in task-parallel ILU-preconditioned CG solversusing MPI + OmpSs
(Wiley, 2017-11-10)
We target the parallel solution of sparse linear systems via iterative Krylov subspace–based methods enhanced with incomplete LU (ILU)-type preconditioners on clusters of multicore processors. In order to tackle large-scale ...
A framework for genomic sequencing on clusters of multicore and manycore processors
(Sage, 2016-06)
The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target ...
Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures
(Springer-Verlag, 2017-03)
We propose an approach to estimate the power consumption of algorithms, as a function of the frequency and number of cores, using only a very reduced set of real power measures. In addition, we also provide the formulation ...
DMR API: Improving cluster productivity by turning applications into malleable
(Elsevier, 2018)
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of
number of processes. To carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with ...