Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor

Dolz, Manuel F.; Martínez, Héctor; Alonso, Pedro; Quintana-Orti, Enrique S.

dc.contributor.author	Dolz, Manuel F.
dc.contributor.author	Martínez, Héctor
dc.contributor.author	Alonso, Pedro
dc.contributor.author	Quintana-Orti, Enrique S.
dc.date.accessioned	2023-03-06T10:38:28Z
dc.date.available	2023-03-06T10:38:28Z
dc.date.issued	2022
dc.identifier.citation	DOLZ, Manuel F., et al. Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor. En 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2022. p. 1-10.	ca_CA
dc.identifier.isbn	9781665451550
dc.identifier.uri	http://hdl.handle.net/10234/201928
dc.description	Ponència presentada a 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) celebrat a Bordeaux, França.	ca_CA
dc.description.abstract	The convolution operator is a crucial kernel for many computer vision and signal processing applications that rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received considerable attention in the past few years for a fair range of processor architectures. In this paper, we follow the technology trend toward integrating long SIMD (single instruction, multiple data) arithmetic units into high performance multicore processors to analyse the benefits of this type of hardware acceleration for latency-constrained DL workloads. For this purpose, we implement and optimise for the Fujitsu processor A64FX, three distinct methods for the calculation of the convolution, namely, the lowering approach, a blocked variant of the direct convolution algorithm, and the Winograd minimal filtering algorithm. Our experimental results include an extensive evaluation of the parallel scalability of these three methods and a comparison of their global performance using three popular DL models and a representative dataset.	ca_CA
dc.format.extent	10 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	IEEE	ca_CA
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	ca_CA
dc.subject	Convolutional neural networks	ca_CA
dc.subject	high performance	ca_CA
dc.subject	SIMD arithmetic units	ca_CA
dc.subject	ARM-based A64FX processor	ca_CA
dc.title	Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor	ca_CA
dc.type	info:eu-repo/semantics/conferenceObject	ca_CA
dc.identifier.doi	https://doi.org/10.1109/SBAC-PAD55451.2022.00027
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	https://ieeexplore.ieee.org/document/9980987/authors#authors	ca_CA
dc.type.version	info:eu-repo/semantics/publishedVersion	ca_CA
project.funder.name	Ministerio de Ciencia, Innovación y Universidades	ca_CA
project.funder.name	Generalitat Valenciana	ca_CA
project.funder.name	European High Performance Computing Joint Undertaking (JU)	ca_CA
oaire.awardNumber	TIN2017- 82972	ca_CA
oaire.awardNumber	Prometeo/2019/109	ca_CA
oaire.awardNumber	CDEIGENT/2018/014	ca_CA
oaire.awardNumber	Grant agreement No 955558	ca_CA

Ficheros en el ítem

Nombre:: Versió editorial.pdf
Tamaño:: 1.103Mb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Congressos i conferències [81]

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como: http://creativecommons.org/licenses/by-nc-nd/4.0/