DMR API: Improving cluster productivity by turning applications into malleable
Visualitza/
Impacte
Scholar |
Altres documents de l'autoria: Iserte, Sergio; Mayo, Rafael; Quintana-Orti, Enrique S.; Beltrán, Vicenç; Peña Monferrer, Antonio J.
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7035
comunitat-uji-handle3:10234/8617
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
DMR API: Improving cluster productivity by turning applications into malleableAutoria
Data de publicació
2018Editor
ElsevierISSN
0167-8191Cita bibliogràfica
ISERTE, Sergio, et al. DMR API: Improving cluster productivity by turning applications into malleable. Parallel Computing, 2018, vol. 78, p. 54-66.Tipus de document
info:eu-repo/semantics/articleVersió de l'editorial
https://www.sciencedirect.com/science/article/pii/S0167819118302229Versió
info:eu-repo/semantics/submittedVersionParaules clau / Matèries
Resum
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of
number of processes. To carry out these job reconfigurations, we have designed a methodology which enables a job to communicate ... [+]
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of
number of processes. To carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the
runtime, to change its number of MPI ranks. The collaboration between both the workload manager—aware of the queue of jobs and the resources allocation—and the parallel
runtime—able to transparently handle the processes and the program data—is crucial for
our throughput-aware malleability methodology. Hence, when a job triggers a reconfiguration, the resource manager will check the cluster status and return the appropriate action:
i) expand, if there are spare resources; ii) shrink, if queued jobs can be initiated; or iii)
none, if no change can improve the global productivity. In this paper, we describe the internals of our framework and demonstrate how it reduces the global workload completion
time along with providing a more efficient usage of the underlying resources. For this purpose, we present a thorough study of the adaptive workloads processing by showing the
detailed behavior of our framework in representative experiments. [-]
Publicat a
Parallel Computing 78 (2018)Proyecto de investigación
TIN2014-53495-R and TIN2015-65316-PDrets d'accés
http://rightsstatements.org/vocab/CNE/1.0/
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Apareix a les col.leccions
- ICC_Articles [413]
- EMC_Articles [803]