Mostrar el registro sencillo del ítem
Reformulating the direct convolution for high-performance deep learning inference on ARM processors
dc.contributor.author | Barrachina Mir, Sergio | |
dc.contributor.author | Castelló, Adrián | |
dc.contributor.author | Dolz, Manuel F. | |
dc.contributor.author | Low, Tze Meng | |
dc.contributor.author | Martinez, Hector | |
dc.contributor.author | Quintana-Orti, Enrique S. | |
dc.contributor.author | Upasana, Sridhar | |
dc.contributor.author | Tomás Domínguez, Andrés Enrique | |
dc.date.accessioned | 2023-01-30T08:14:06Z | |
dc.date.available | 2023-01-30T08:14:06Z | |
dc.date.issued | 2022-12-20 | |
dc.identifier.citation | Barrachina, S., Castelló, A., Dolz, M. F., Low, T. M., Martínez, H., Quintana-Ortí, E. S., ... & Tomás, A. E. (2023). Reformulating the direct convolution for high-performance deep learning inference on ARM processors. Journal of Systems Architecture, 135, 102806. | ca_CA |
dc.identifier.issn | 1383-7621 | |
dc.identifier.uri | http://hdl.handle.net/10234/201463 | |
dc.description.abstract | We present two high-performance implementations of the convolution operator via the direct algorithm that outperform the so-called lowering approach based on the im2col transform plus the gemm kernel on an ARMv8-based processor. One of our methods presents the additional advantage of zero-memory overhead while the other employs an additional yet rather moderate workspace, substantially smaller than that required by the im2col+gemm solution. In contrast with a previous implementation of a similar zero-memory overhead direct convolution, this work exhibits the key advantage of preserving the conventional NHWC data layout for the input/output activations of the convolution layers. | ca_CA |
dc.description.sponsorShip | Funding for open access charge: CRUE-Universitat Jaume I | |
dc.format.extent | 13 p. | ca_CA |
dc.format.mimetype | application/pdf | ca_CA |
dc.language.iso | eng | ca_CA |
dc.publisher | Elsevier | ca_CA |
dc.relation.isPartOf | Journal of Systems Architecture 135 (2023) 102806 | ca_CA |
dc.rights | 1383-7621/© 2022 The Author(s). Published by Elsevier B.V. | ca_CA |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | ca_CA |
dc.subject | convolution | ca_CA |
dc.subject | direct algorithm | ca_CA |
dc.subject | deep learning | ca_CA |
dc.subject | high performance | ca_CA |
dc.subject | ARMv8 architecture | ca_CA |
dc.title | Reformulating the direct convolution for high-performance deep learning inference on ARM processors | ca_CA |
dc.type | info:eu-repo/semantics/article | ca_CA |
dc.identifier.doi | https://doi.org/10.1016/j.sysarc.2022.102806 | |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | ca_CA |
dc.type.version | info:eu-repo/semantics/publishedVersion | ca_CA |
project.funder.name | Generalitat Valenciana | ca_CA |
project.funder.name | Junta de Andalucía | ca_CA |
project.funder.name | European High-Performance Computing Joint Undertaking (JU) | ca_CA |
project.funder.name | European Union’s Horizon 2020 | ca_CA |
oaire.awardNumber | PID2020-113656RB-C21/-C22 | ca_CA |
oaire.awardNumber | MCIN/AEI/10.13039/501100011033 | ca_CA |
oaire.awardNumber | FJC2019-039222-I | ca_CA |
oaire.awardNumber | MCIN/AEI/10.13039/501100011033 | ca_CA |
oaire.awardNumber | CDEIGENT/2018/014 | ca_CA |
oaire.awardNumber | POSTDOC_21_00025 | ca_CA |
oaire.awardNumber | 955558 | ca_CA |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
ICC_Articles [423]