Buscar
A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting
(IEEE, 2019-01)
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target ...
iMODS: internal coordinates normal mode analysis server
(Oxford University Press, 2014)
Normal mode analysis (NMA) in internal (dihedral) coordinates naturally reproduces the collective functional motions of biological macromolecules. iMODS facilitates the exploration of such modes and generates feasible ...
Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation
(Association for Computing Machinery (ACM), 2020-03)
We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. ...
Deriving dense linear algebra libraries
(Springer London, 2013-11)
Starting in the late 1960s computer scientists including Dijkstra and Hoare advocated goal- oriented programming and the formal derivation of algorithms. The chief impediment to realizing this for loop-based programs was ...
DMRlib: Easy-coding and Efficient Resource Management for Job Malleability
(IEEE, 2020-09-09)
Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible ...
Accelerating the Lyapack library using GPUs
(Springer, 2013)
Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend ...
Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD
(Springer Verlag, 2019)
We address the reduction to compact band forms, via unitary similarity
transformations, for the solution of symmetric eigenvalue problems and the computation of the singular value decomposition (SVD). Concretely, in the ...
Toward a modular precision ecosystem for high-performance computing
(Sage, 2019-05)
With the memory bandwidth of current computer architectures being significantly slower than the (floating point) arithmetic performance, many scientific computations only leverage a fraction of the computational power in ...
A complete and efficient CUDA-sharing solution for HPC clusters
(Elsevier, 2014)
In this paper we detail the key features, architectural design, and implementation of rCUDA,
an advanced framework to enable remote and transparent GPGPU acceleration in HPC
clusters. rCUDA allows decoupling GPUs from ...
Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs
(Elsevier, 2014)
In this paper, we investigate how to exploit task-parallelism during the execution of the
Cholesky factorization on clusters of multicore processors with the SMPSs programming
model. Our analysis reveals that the major ...