Enabling big data analytics in the hybrid cloud using iterative MapReduce

Clemente-Castelló, Francisco J.; Bogdan, Nicolae; Katrinis, Kostas; Rafique, M. Mustafa; Mayo, Rafael; Fernández Fernández, Juan Carlos; Loreti, Daniela

dc.contributor.author	Clemente-Castelló, Francisco J.
dc.contributor.author	Bogdan, Nicolae
dc.contributor.author	Katrinis, Kostas
dc.contributor.author	Rafique, M. Mustafa
dc.contributor.author	Mayo, Rafael
dc.contributor.author	Fernández Fernández, Juan Carlos
dc.contributor.author	Loreti, Daniela
dc.date.accessioned	2016-04-27T10:07:34Z
dc.date.available	2016-04-27T10:07:34Z
dc.date.issued	2015-12
dc.identifier.citation	CLEMENTE CASTELLÓ, Francisco José; BOGDAN, Nicolae; KATRINIS, Kostas; RAFIQUE, M. Mustafa; MAYO, Rafael; FERNÁNDEZ FERNÁNDEZ, Juan Carlos; LORETI, Daniela. Enabling big data analytics in the hybrid cloud using iterative MapReduce. UCC'15: The 8th IEEE/ACM International Conference on Utility and Cloud Computing, Dec 2015, Limassol, Cyprus. < hal-01207186 >	ca_CA
dc.identifier.uri	http://hdl.handle.net/10234/158986
dc.description.abstract	The cloud computing model has seen tremendous commercial success through its materialization via two prominent models to date, namely public and private cloud. Recently, a third model combining the former two service modelsas on-/off-premise resources has been receiving significant market traction: hybrid cloud. While state of art techniques that address workload performance prediction and efficient workload execution over hybrid cloud setups exist, how to address data-intensive workloads - including Big Data Analytics - in similar environments is nascent. This paper addresses this gap by taking on the challenge of bursting over hybrid clouds for the benefit of accelerating iterative MapReduce applications. We first specify the challenges associated with data locality and data movement in such setups. Subsequently, we propose a novel technique to address the locality issue, without requiring changes to the MapReduce framework or the underlying storage layer. In addition, we contribute with a performance prediction methodology that combines modeling with micro-benchmarks to estimate completion time for iterative MapReduce applications, which enables users to estimate cost-to-solution before committing extra resources from public clouds. We show through experimentation in a dual-Openstack hybrid cloud setup that our solutions manage to bring substantial improvement at predictable cost-control for two real-life iterative MapReduce applications: large-scale machine learning and text analysis.	ca_CA
dc.format.extent	10 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	HAL-Inria	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/CNE/1.0/	*
dc.subject	Hybrid Cloud	ca_CA
dc.subject	Big Data Analytics	ca_CA
dc.subject	Iterative Applications	ca_CA
dc.subject	MapReduce	ca_CA
dc.subject	Data locality	ca_CA
dc.subject	Performance Prediction	ca_CA
dc.title	Enabling big data analytics in the hybrid cloud using iterative MapReduce	ca_CA
dc.type	info:eu-repo/semantics/conferenceObject	ca_CA
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	https://hal.inria.fr/hal-01207186/en	ca_CA

Ficheros en el ítem

Nombre:: Clemente_2016_Enabling.pdf
Tamaño:: 478.5Kb
Formato:: PDF
Descripción:: Conferencia

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Congressos i conferències [79]

Mostrar el registro sencillo del ítem