Mostrar el registro sencillo del ítem
Automatic generation of ARM NEON micro‑kernels for matrix multiplication
dc.contributor.author | Alaejos, Guillermo | |
dc.contributor.author | Martínez, Héctor | |
dc.contributor.author | Castelló, Adrián | |
dc.contributor.author | Dolz, Manuel F. | |
dc.contributor.author | Igual, Francisco | |
dc.contributor.author | Alonso-Jordá, Pedro | |
dc.contributor.author | Quintana-Orti, Enrique S. | |
dc.date.accessioned | 2024-05-15T07:45:24Z | |
dc.date.available | 2024-05-15T07:45:24Z | |
dc.date.issued | 2024-03-12 | |
dc.identifier.citation | Alaejos, G., Martínez, H., Castelló, A. et al. Automatic generation of ARM NEON micro-kernels for matrix multiplication. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05955-8 | ca_CA |
dc.identifier.issn | 0920-8542 | |
dc.identifier.issn | 1573-0484 | |
dc.identifier.uri | http://hdl.handle.net/10234/207342 | |
dc.description.abstract | General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, m | ca_CA |
dc.description.sponsorShip | Funding for open access charge: CRUE-Universitat Jaume I | |
dc.format.extent | 27 p. | ca_CA |
dc.format.mimetype | application/pdf | ca_CA |
dc.language.iso | eng | ca_CA |
dc.publisher | Springer | ca_CA |
dc.rights | © The Author(s) 2024 | ca_CA |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | ca_CA |
dc.subject | matrix multiplication | ca_CA |
dc.subject | ARM NEON | ca_CA |
dc.subject | SIMD arithmetic units | ca_CA |
dc.subject | high performance | ca_CA |
dc.title | Automatic generation of ARM NEON micro‑kernels for matrix multiplication | ca_CA |
dc.type | info:eu-repo/semantics/article | ca_CA |
dc.identifier.doi | https://doi.org/10.1007/s11227-024-05955-8 | |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | ca_CA |
dc.type.version | info:eu-repo/semantics/publishedVersion | ca_CA |
project.funder.name | CRUE-CSIC agreement with Springer Nature | ca_CA |
project.funder.name | European Commission, European Union | ca_CA |
project.funder.name | Junta de Andalucía | ca_CA |
project.funder.name | Agencia Estatal de Investigación | ca_CA |
project.funder.name | Generalitat Valenciana | ca_CA |
oaire.awardNumber | 95555 | ca_CA |
oaire.awardNumber | POSTDOC_21_00025 | ca_CA |
oaire.awardNumber | FJC2019-039222 | ca_CA |
oaire.awardNumber | PID2020-113656R | ca_CA |
oaire.awardNumber | PID2021-12657NB-I00 | ca_CA |
oaire.awardNumber | CIDEXG/2022/013 | ca_CA |
oaire.awardNumber | PROMETEO 2023-CIPROM/2022/20 | ca_CA |
dc.subject.ods | 9. Industria, innovacion e infraestructura | ca_CA |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
ICC_Articles [427]