Reformulating the direct convolution for high-performance deep learning inference on ARM processors

Barrachina Mir, Sergio; Castelló, Adrián; Dolz, Manuel F.; Low, Tze Meng; Martinez, Hector; Quintana-Orti, Enrique S.; Upasana, Sridhar; Tomás Domínguez, Andrés Enrique

dc.contributor.author	Barrachina Mir, Sergio
dc.contributor.author	Castelló, Adrián
dc.contributor.author	Dolz, Manuel F.
dc.contributor.author	Low, Tze Meng
dc.contributor.author	Martinez, Hector
dc.contributor.author	Quintana-Orti, Enrique S.
dc.contributor.author	Upasana, Sridhar
dc.contributor.author	Tomás Domínguez, Andrés Enrique
dc.date.accessioned	2023-01-30T08:14:06Z
dc.date.available	2023-01-30T08:14:06Z
dc.date.issued	2022-12-20
dc.identifier.citation	Barrachina, S., Castelló, A., Dolz, M. F., Low, T. M., Martínez, H., Quintana-Ortí, E. S., ... & Tomás, A. E. (2023). Reformulating the direct convolution for high-performance deep learning inference on ARM processors. Journal of Systems Architecture, 135, 102806.	ca_CA
dc.identifier.issn	1383-7621
dc.identifier.uri	http://hdl.handle.net/10234/201463
dc.description.abstract	We present two high-performance implementations of the convolution operator via the direct algorithm that outperform the so-called lowering approach based on the im2col transform plus the gemm kernel on an ARMv8-based processor. One of our methods presents the additional advantage of zero-memory overhead while the other employs an additional yet rather moderate workspace, substantially smaller than that required by the im2col+gemm solution. In contrast with a previous implementation of a similar zero-memory overhead direct convolution, this work exhibits the key advantage of preserving the conventional NHWC data layout for the input/output activations of the convolution layers.	ca_CA
dc.description.sponsorShip	Funding for open access charge: CRUE-Universitat Jaume I
dc.format.extent	13 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Elsevier	ca_CA
dc.relation.isPartOf	Journal of Systems Architecture 135 (2023) 102806	ca_CA
dc.rights	1383-7621/© 2022 The Author(s). Published by Elsevier B.V.	ca_CA
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	ca_CA
dc.subject	convolution	ca_CA
dc.subject	direct algorithm	ca_CA
dc.subject	deep learning	ca_CA
dc.subject	high performance	ca_CA
dc.subject	ARMv8 architecture	ca_CA
dc.title	Reformulating the direct convolution for high-performance deep learning inference on ARM processors	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	https://doi.org/10.1016/j.sysarc.2022.102806
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.type.version	info:eu-repo/semantics/publishedVersion	ca_CA
project.funder.name	Generalitat Valenciana	ca_CA
project.funder.name	Junta de Andalucía	ca_CA
project.funder.name	European High-Performance Computing Joint Undertaking (JU)	ca_CA
project.funder.name	European Union’s Horizon 2020	ca_CA
oaire.awardNumber	PID2020-113656RB-C21/-C22	ca_CA
oaire.awardNumber	MCIN/AEI/10.13039/501100011033	ca_CA
oaire.awardNumber	FJC2019-039222-I	ca_CA
oaire.awardNumber	MCIN/AEI/10.13039/501100011033	ca_CA
oaire.awardNumber	CDEIGENT/2018/014	ca_CA
oaire.awardNumber	POSTDOC_21_00025	ca_CA
oaire.awardNumber	955558	ca_CA

Ficheros en el ítem

Nombre:: barrachina_2022_reformulating.pdf
Tamaño:: 971.4Kb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [424]

Mostrar el registro sencillo del ítem