Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting

Clemente-Castelló, Francisco J.; Bogdan, Nicolae; Mayo, Rafael; Fernández Fernández, Juan Carlos

dc.contributor.author	Clemente-Castelló, Francisco J.
dc.contributor.author	Bogdan, Nicolae
dc.contributor.author	Mayo, Rafael
dc.contributor.author	Fernández Fernández, Juan Carlos
dc.date.accessioned	2018-10-16T09:52:58Z
dc.date.available	2018-10-16T09:52:58Z
dc.date.issued	2018-02
dc.identifier.citation	CLEMENTE-CASTELLO, Francisco J., et al. Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting. IEEE Transactions on Parallel and Distributed Systems, 2018.	ca_CA
dc.identifier.uri	http://hdl.handle.net/10234/176782
dc.description.abstract	Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the overall capacity during peak utilization) can be a cost-effective way to deal with the increasing complexity of big data analytics, especially for iterative applications. However, the low throughput, high latency network link between the on-premise and off-premise resources (“weak link”) makes maintaining scalability difficult. While several data locality techniques have been designed for big data bursting on hybrid clouds, their effectiveness is difficult to estimate in advance. Yet such estimations are critical, because they help users decide whether the extra pay-as-you-go cost incurred by using the off-premise resources justifies the runtime speed-up. To this end, the current paper presents a performance model and methodology to estimate the runtime of iterative MapReduce applications in a hybrid cloud-bursting scenario. The paper focuses on the overhead incurred by the weak link at fine granularity, for both the map and the reduce phases. This approach enables high estimation accuracy, as demonstrated by extensive experiments at scale using a mix of real-world iterative MapReduce applications from standard big data benchmarking suites that cover a broad spectrum of data patterns. Not only are the produced estimations accurate in absolute terms compared with experimental results, but they are also up to an order of magnitude more accurate than applying state-of-art estimation approaches originally designed for single-site MapReduce deployments.	ca_CA
dc.format.extent	14 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	IEEE	ca_CA
dc.rights	© Copyright 2018 IEEE - All rights reserved.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	hybrid cloud	ca_CA
dc.subject	big data analytics	ca_CA
dc.subject	iterative applications	ca_CA
dc.subject	MapReduce	ca_CA
dc.subject	performance prediction	ca_CA
dc.subject	runtime estimation	ca_CA
dc.title	Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	http://dx.doi.org/10.1109/TPDS.2018.2802932
dc.relation.projectID	U.S. Department of Energy, Office of Science (DE-AC02-06CH11357) ; Spanish CICYT (projects TIN2014-53495-R and TIN2017-82972-R)	ca_CA
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	https://ieeexplore.ieee.org/abstract/document/8283575	ca_CA
dc.type.version	info:eu-repo/semantics/acceptedVersion	ca_CA

Ficheros en el ítem

Nombre:: IEEE_TRANSACTIONS_ON_PARALLEL_ ...
Tamaño:: 1.598Mb
Formato:: PDF
Descripción:: versió preprint

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [424]

Mostrar el registro sencillo del ítem