Mostrar el registro sencillo del ítem
Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
dc.contributor.author | Catalán, Sandra | |
dc.contributor.author | Igual, Francisco | |
dc.contributor.author | Mayo, Rafael | |
dc.contributor.author | Rodríguez Sánchez, Rafael | |
dc.contributor.author | Quintana-Orti, Enrique S. | |
dc.date.accessioned | 2016-10-13T16:07:29Z | |
dc.date.available | 2016-10-13T16:07:29Z | |
dc.date.issued | 2016-09 | |
dc.identifier.citation | CATALÁN, Sandra, et al. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors | ca_CA |
dc.identifier.issn | 1386-7857 | |
dc.identifier.issn | 1573-7543 | |
dc.identifier.uri | http://hdl.handle.net/10234/163597 | |
dc.description.abstract | Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scienti c applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware con guration as well as asymmetric{static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy effciency. | ca_CA |
dc.description.sponsorShip | The researchers from Universitat Jaume I were supported by projects CICYT TIN2011-23283 and TIN2014-53495-R of MINECO and FEDER, the EU project FP7 318793 “EXA2GREEN” and the FPU program of MECD. The researcher from Universidad Com- plutense de Madrid was supported by project CICYT TIN2012-32180. | ca_CA |
dc.format.extent | 28 p. | ca_CA |
dc.format.mimetype | application/pdf | ca_CA |
dc.language.iso | eng | ca_CA |
dc.publisher | Springer US | ca_CA |
dc.relation.isPartOf | Cluster Computing, 2016, vol. 19, núm. 3 | ca_CA |
dc.rights | © Springer Science+Business Media New York 2016 | ca_CA |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | * |
dc.subject | Matrix multiplication | ca_CA |
dc.subject | Asymmetric multicore processors | ca_CA |
dc.subject | Memory hierarchy | ca_CA |
dc.subject | Scheduling | ca_CA |
dc.subject | Multi-threading | ca_CA |
dc.subject | High performance computing | ca_CA |
dc.title | Architecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors | ca_CA |
dc.type | info:eu-repo/semantics/article | ca_CA |
dc.identifier.doi | http://dx.doi.org/10.1007/s10586-016-0611-8 | |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | ca_CA |
dc.relation.publisherVersion | http://link.springer.com/article/10.1007/s10586-016-0611-8 | ca_CA |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
ICC_Articles [417]