Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor
View/ Open
Impact
Scholar |
Other documents of the author: Dolz, Manuel F.; Martínez, Héctor; Alonso, Pedro; Quintana-Orti, Enrique S.
Metadata
Show full item recordcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/146069
comunitat-uji-handle4:
INVESTIGACIONMetadata
Title
Convolution Operators for Deep Learning Inference on the Fujitsu A64FX ProcessorDate
2022Publisher
IEEEISBN
9781665451550Bibliographic citation
DOLZ, Manuel F., et al. Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor. En 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2022. p. 1-10.Type
info:eu-repo/semantics/conferenceObjectPublisher version
https://ieeexplore.ieee.org/document/9980987/authors#authorsVersion
info:eu-repo/semantics/publishedVersionSubject
Abstract
The convolution operator is a crucial kernel for
many computer vision and signal processing applications that
rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has ... [+]
The convolution operator is a crucial kernel for
many computer vision and signal processing applications that
rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received considerable attention
in the past few years for a fair range of processor architectures.
In this paper, we follow the technology trend toward integrating long SIMD (single instruction, multiple data) arithmetic units
into high performance multicore processors to analyse the benefits of this type of hardware acceleration for latency-constrained
DL workloads. For this purpose, we implement and optimise
for the Fujitsu processor A64FX, three distinct methods for the
calculation of the convolution, namely, the lowering approach,
a blocked variant of the direct convolution algorithm, and the
Winograd minimal filtering algorithm. Our experimental results
include an extensive evaluation of the parallel scalability of these
three methods and a comparison of their global performance
using three popular DL models and a representative dataset. [-]
Description
Ponència presentada a 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) celebrat a Bordeaux, França.
Funder Name
Ministerio de Ciencia, Innovación y Universidades | Generalitat Valenciana | European High Performance Computing Joint Undertaking (JU)
Project code
TIN2017- 82972 | Prometeo/2019/109 | CDEIGENT/2018/014 | Grant agreement No 955558
Rights
info:eu-repo/semantics/openAccess