Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors
Impacte
Scholar |
Altres documents de l'autoria: Catalán, Sandra; Herrero Zaragoza, José R.; Igual, Francisco; Rodríguez Sánchez, Rafael; Quintana-Orti, Enrique S.; Adeniyi-Jones, Chris
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONAquest recurs és restringit
https://doi.org/10.1016/j.jocs.2016.10.020 |
Metadades
Títol
Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processorsAutoria
Data de publicació
2018-03Editor
ElsevierCita bibliogràfica
CATALÁN, Sandra, et al. Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors. Journal of Computational Science, Volume 25, March 2018, Pages 140-151.Tipus de document
info:eu-repo/semantics/articleVersió de l'editorial
https://www.sciencedirect.com/science/article/pii/S1877750316302812Versió
info:eu-repo/semantics/publishedVersionParaules clau / Matèries
Resum
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the ... [+]
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures, the adaption of these libraries for asymmetric multicore processors (AMPs) is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the BLIS framework, and tailored for AMPs equipped with two types of cores: fast/power-hungry versus slow/energy-efficient. For this purpose, we integrate coarse-grain and fine-grain parallelization strategies into the library routines which, respectively, dynamically distribute the workload between the two core types and statically repartition this work among the cores of the same type.
Our results on an ARM® big.LITTLE™ processor embedded in the Exynos 5422 SoC, using the asymmetry-aware version of the BLAS and a plain migration of the legacy version of LAPACK, experimentally assess the benefits, limitations, and potential of this approach from the perspectives of both throughput and energy efficiency. [-]
Proyecto de investigación
MINECO and FEDER (CICYTTIN2011-23283 and TIN2014-53495-R), (CICYTTIN2015-65277-R) ; Spanish Ministry of Education (TIN2015-65316-P) ; Generalitat de Catalunya, Dep. d'Innovació, Universitats i Empresa (2014 SGR 1051)Drets d'accés
© 2016 Elsevier B.V. All rights reserved.
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
Apareix a les col.leccions
- ICC_Articles [427]