Mostrar el registro sencillo del ítem

dc.contributor.authorCatalán, Sandra
dc.contributor.authorIgual, Francisco
dc.contributor.authorMayo, Rafael
dc.contributor.authorRodríguez Sánchez, Rafael
dc.contributor.authorQuintana-Orti, Enrique S.
dc.date.accessioned2016-10-13T16:07:29Z
dc.date.available2016-10-13T16:07:29Z
dc.date.issued2016-09
dc.identifier.citationCATALÁN, Sandra, et al. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processorsca_CA
dc.identifier.issn1386-7857
dc.identifier.issn1573-7543
dc.identifier.urihttp://hdl.handle.net/10234/163597
dc.description.abstractAsymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scienti c applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware con guration as well as asymmetric{static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy effciency.ca_CA
dc.description.sponsorShipThe researchers from Universitat Jaume I were supported by projects CICYT TIN2011-23283 and TIN2014-53495-R of MINECO and FEDER, the EU project FP7 318793 “EXA2GREEN” and the FPU program of MECD. The researcher from Universidad Com- plutense de Madrid was supported by project CICYT TIN2012-32180.ca_CA
dc.format.extent28 p.ca_CA
dc.format.mimetypeapplication/pdfca_CA
dc.language.isoengca_CA
dc.publisherSpringer USca_CA
dc.relation.isPartOfCluster Computing, 2016, vol. 19, núm. 3ca_CA
dc.rights© Springer Science+Business Media New York 2016ca_CA
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/*
dc.subjectMatrix multiplicationca_CA
dc.subjectAsymmetric multicore processorsca_CA
dc.subjectMemory hierarchyca_CA
dc.subjectSchedulingca_CA
dc.subjectMulti-threadingca_CA
dc.subjectHigh performance computingca_CA
dc.titleArchitecture-Aware Con guration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processorsca_CA
dc.typeinfo:eu-repo/semantics/articleca_CA
dc.identifier.doihttp://dx.doi.org/10.1007/s10586-016-0611-8
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca_CA
dc.relation.publisherVersionhttp://link.springer.com/article/10.1007/s10586-016-0611-8ca_CA


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem