Mostrar el registro sencillo del ítem

dc.contributor.authorAlaejos, Guillermo
dc.contributor.authorMartínez, Héctor
dc.contributor.authorCastelló, Adrián
dc.contributor.authorDolz, Manuel F.
dc.contributor.authorIgual, Francisco
dc.contributor.authorAlonso-Jordá, Pedro
dc.contributor.authorQuintana-Orti, Enrique S.
dc.date.accessioned2024-05-15T07:45:24Z
dc.date.available2024-05-15T07:45:24Z
dc.date.issued2024-03-12
dc.identifier.citationAlaejos, G., Martínez, H., Castelló, A. et al. Automatic generation of ARM NEON micro-kernels for matrix multiplication. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05955-8ca_CA
dc.identifier.issn0920-8542
dc.identifier.issn1573-0484
dc.identifier.urihttp://hdl.handle.net/10234/207342
dc.description.abstractGeneral matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mca_CA
dc.description.sponsorShipFunding for open access charge: CRUE-Universitat Jaume I
dc.format.extent27 p.ca_CA
dc.format.mimetypeapplication/pdfca_CA
dc.language.isoengca_CA
dc.publisherSpringerca_CA
dc.rights© The Author(s) 2024ca_CA
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/ca_CA
dc.subjectmatrix multiplicationca_CA
dc.subjectARM NEONca_CA
dc.subjectSIMD arithmetic unitsca_CA
dc.subjecthigh performanceca_CA
dc.titleAutomatic generation of ARM NEON micro‑kernels for matrix multiplicationca_CA
dc.typeinfo:eu-repo/semantics/articleca_CA
dc.identifier.doihttps://doi.org/10.1007/s11227-024-05955-8
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca_CA
dc.type.versioninfo:eu-repo/semantics/publishedVersionca_CA
project.funder.nameCRUE-CSIC agreement with Springer Natureca_CA
project.funder.nameEuropean Commission, European Unionca_CA
project.funder.nameJunta de Andalucíaca_CA
project.funder.nameAgencia Estatal de Investigaciónca_CA
project.funder.nameGeneralitat Valencianaca_CA
oaire.awardNumber95555ca_CA
oaire.awardNumberPOSTDOC_21_00025ca_CA
oaire.awardNumberFJC2019-039222ca_CA
oaire.awardNumberPID2020-113656Rca_CA
oaire.awardNumberPID2021-12657NB-I00ca_CA
oaire.awardNumberCIDEXG/2022/013ca_CA
oaire.awardNumberPROMETEO 2023-CIPROM/2022/20ca_CA
dc.subject.ods9. Industria, innovacion e infraestructuraca_CA


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

© The Author(s) 2024
Excepto si se señala otra cosa, la licencia del ítem se describe como: © The Author(s) 2024