Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor
Ver/ Abrir
Impacto
Scholar |
Otros documentos de la autoría: Dolz, Manuel F.; Martínez, Héctor; Alonso, Pedro; Quintana-Orti, Enrique S.
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/146069
comunitat-uji-handle4:
INVESTIGACIONMetadatos
Título
Convolution Operators for Deep Learning Inference on the Fujitsu A64FX ProcessorFecha de publicación
2022Editor
IEEEISBN
9781665451550Cita bibliográfica
DOLZ, Manuel F., et al. Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor. En 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2022. p. 1-10.Tipo de documento
info:eu-repo/semantics/conferenceObjectVersión de la editorial
https://ieeexplore.ieee.org/document/9980987/authors#authorsVersión
info:eu-repo/semantics/publishedVersionPalabras clave / Materias
Resumen
The convolution operator is a crucial kernel for
many computer vision and signal processing applications that
rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has ... [+]
The convolution operator is a crucial kernel for
many computer vision and signal processing applications that
rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received considerable attention
in the past few years for a fair range of processor architectures.
In this paper, we follow the technology trend toward integrating long SIMD (single instruction, multiple data) arithmetic units
into high performance multicore processors to analyse the benefits of this type of hardware acceleration for latency-constrained
DL workloads. For this purpose, we implement and optimise
for the Fujitsu processor A64FX, three distinct methods for the
calculation of the convolution, namely, the lowering approach,
a blocked variant of the direct convolution algorithm, and the
Winograd minimal filtering algorithm. Our experimental results
include an extensive evaluation of the parallel scalability of these
three methods and a comparison of their global performance
using three popular DL models and a representative dataset. [-]
Descripción
Ponència presentada a 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) celebrat a Bordeaux, França.
Entidad financiadora
Ministerio de Ciencia, Innovación y Universidades | Generalitat Valenciana | European High Performance Computing Joint Undertaking (JU)
Código del proyecto o subvención
TIN2017- 82972 | Prometeo/2019/109 | CDEIGENT/2018/014 | Grant agreement No 955558
Derechos de acceso
info:eu-repo/semantics/openAccess