Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor
![Thumbnail](/xmlui/bitstream/handle/10234/201928/Versio%cc%81%20editorial.pdf.jpg?sequence=4&isAllowed=y)
Visualitza/
Impacte
![Google Scholar](/xmlui/themes/Mirage2/images/uji/logo_google.png)
![Microsoft Academico](/xmlui/themes/Mirage2/images/uji/logo_microsoft.png)
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/146069
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
Convolution Operators for Deep Learning Inference on the Fujitsu A64FX ProcessorData de publicació
2022Editor
IEEEISBN
9781665451550Cita bibliogràfica
DOLZ, Manuel F., et al. Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor. En 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2022. p. 1-10.Tipus de document
info:eu-repo/semantics/conferenceObjectVersió de l'editorial
https://ieeexplore.ieee.org/document/9980987/authors#authorsVersió
info:eu-repo/semantics/publishedVersionParaules clau / Matèries
Resum
The convolution operator is a crucial kernel for
many computer vision and signal processing applications that
rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has ... [+]
The convolution operator is a crucial kernel for
many computer vision and signal processing applications that
rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received considerable attention
in the past few years for a fair range of processor architectures.
In this paper, we follow the technology trend toward integrating long SIMD (single instruction, multiple data) arithmetic units
into high performance multicore processors to analyse the benefits of this type of hardware acceleration for latency-constrained
DL workloads. For this purpose, we implement and optimise
for the Fujitsu processor A64FX, three distinct methods for the
calculation of the convolution, namely, the lowering approach,
a blocked variant of the direct convolution algorithm, and the
Winograd minimal filtering algorithm. Our experimental results
include an extensive evaluation of the parallel scalability of these
three methods and a comparison of their global performance
using three popular DL models and a representative dataset. [-]
Descripció
Ponència presentada a 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) celebrat a Bordeaux, França.
Entitat finançadora
Ministerio de Ciencia, Innovación y Universidades | Generalitat Valenciana | European High Performance Computing Joint Undertaking (JU)
Codi del projecte o subvenció
TIN2017- 82972 | Prometeo/2019/109 | CDEIGENT/2018/014 | Grant agreement No 955558
Drets d'accés
info:eu-repo/semantics/openAccess