Architecture-Aware Con guration and Scheduling of
Matrix Multiplication on
Asymmetric Multicore Processors

Catalán, Sandra; Igual, Francisco; Mayo, Rafael; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S.

dc.contributor.author	Catalán, Sandra
dc.contributor.author	Igual, Francisco
dc.contributor.author	Mayo, Rafael
dc.contributor.author	Rodríguez Sánchez, Rafael
dc.contributor.author	Quintana-Orti, Enrique S.
dc.date.accessioned	2016-10-13T16:07:29Z
dc.date.available	2016-10-13T16:07:29Z
dc.date.issued	2016-09
dc.identifier.citation	CATALÁN, Sandra, et al. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors	ca_CA
dc.identifier.issn	1386-7857
dc.identifier.issn	1573-7543
dc.identifier.uri	http://hdl.handle.net/10234/163597
dc.description.abstract	Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scienti c applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware con guration as well as asymmetric{static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy effciency.	ca_CA
dc.description.sponsorShip	The researchers from Universitat Jaume I were supported by projects CICYT TIN2011-23283 and TIN2014-53495-R of MINECO and FEDER, the EU project FP7 318793 “EXA2GREEN” and the FPU program of MECD. The researcher from Universidad Com- plutense de Madrid was supported by project CICYT TIN2012-32180.	ca_CA
dc.format.extent	28 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Springer US	ca_CA
dc.relation.isPartOf	Cluster Computing, 2016, vol. 19, núm. 3	ca_CA
dc.rights	© Springer Science+Business Media New York 2016	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	Matrix multiplication	ca_CA
dc.subject	Asymmetric multicore processors	ca_CA
dc.subject	Memory hierarchy	ca_CA
dc.subject	Scheduling	ca_CA
dc.subject	Multi-threading	ca_CA
dc.subject	High performance computing	ca_CA
dc.title	Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	http://dx.doi.org/10.1007/s10586-016-0611-8
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	http://link.springer.com/article/10.1007/s10586-016-0611-8	ca_CA

Ficheros en el ítem

Nombre:: 73857.pdf
Tamaño:: 597.3Kb
Formato:: PDF
Descripción:: Versió pre-print de l'Autor

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [417]

Mostrar el registro sencillo del ítem

Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)