A complete and efficient CUDA-sharing solution for HPC clusters

Peña Monferrer, Antonio J.; Reaño, Carlos; Silla, Federico; Mayo, Rafael; Quintana-Orti, Enrique S.; Duato, José

dc.contributor.author	Peña Monferrer, Antonio J.
dc.contributor.author	Reaño, Carlos
dc.contributor.author	Silla, Federico
dc.contributor.author	Mayo, Rafael
dc.contributor.author	Quintana-Orti, Enrique S.
dc.contributor.author	Duato, José
dc.date.accessioned	2015-06-24T14:15:07Z
dc.date.available	2015-06-24T14:15:07Z
dc.date.issued	2014
dc.identifier.issn	0167-8191
dc.identifier.uri	http://hdl.handle.net/10234/125124
dc.description.abstract	In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes, as well as permits a single node to exploit the whole set of GPUs installed in the cluster. In our proposal, CUDA applications can seamlessly interact with any GPU in the cluster, independently of its physical location. Thus, GPUs can be either distributed among compute nodes or concentrated in dedicated GPGPU servers, depending on the cluster administrator’s policy. This proposal leads to savings not only in space but also in energy, acquisition, and maintenance costs. The performance evaluation in this paper with a series of benchmarks and a production application clearly demonstrates the viability of this proposal. Concretely, experiments with the matrix–matrix product reveal excellent performance compared with regular executions on the local GPU; on a much more complex application, the GPU-accelerated LAMMPS, we attain up to 11x speedup employing 8 remote accelerators from a single node with respect to a 12-core CPU-only execution. GPGPU service interaction in compute nodes, remote acceleration in dedicated GPGPU servers, and data transfer performance of similar GPU virtualization frameworks are also evaluated.	ca_CA
dc.format.extent	15 p.	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Elsevier	ca_CA
dc.relation.isPartOf	Parallel Computing Volume 40, Issue 10, December 2014	ca_CA
dc.rights	© 2014 Elsevier B.V. All rights reserved.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	Graphics processors	ca_CA
dc.subject	Virtualization	ca_CA
dc.subject	High performance computing	ca_CA
dc.subject	Clusters	ca_CA
dc.title	A complete and efficient CUDA-sharing solution for HPC clusters	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	http://dx.doi.org/10.1016/j.parco.2014.09.011
dc.rights.accessRights	info:eu-repo/semantics/restrictedAccess	ca_CA
dc.relation.publisherVersion	http://www.sciencedirect.com/science/article/pii/S0167819114001227#	ca_CA

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [419]

Mostrar el registro sencillo del ítem