High Performance and Portable Convolution Operators for Multicore Processors
Ver/ Abrir
Impacto
Scholar |
Otros documentos de la autoría: San Juan, Pablo; Castelló, Adrián; Dolz, Manuel F.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/146069
comunitat-uji-handle4:
INVESTIGACIONMetadatos
Título
High Performance and Portable Convolution Operators for Multicore ProcessorsAutoría
Fecha de publicación
2020-10Editor
IEEEISSN
2643-3001Cita bibliográfica
P. San Juan, A. Castelló, M. F. Dolz, P. Alonso-Jordá and E. S. Quintana-Ortí, "High Performance and Portable Convolution Operators for Multicore Processors," 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto, Portugal, 2020, pp. 91-98, doi: 10.1109/SBAC-PAD49847.2020.00023.Tipo de documento
info:eu-repo/semantics/conferenceObjectVersión de la editorial
https://ieeexplore.ieee.org/document/9235053Versión
info:eu-repo/semantics/submittedVersionResumen
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution operator present in this type of ... [+]
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the im2col transform
followed by a general matrix multiplication (gemm) in order to take advantage of the highly
optimized realizations of the gemm kernel in many linear algebra libraries. The main problems
of this approach are 1) the large memory workspace required to host the intermediate matrices
generated by the im2col transform; and 2) the time to perform the im2col transform, which
is not negligible for complex neural networks. This paper presents a portable high performance
convolution algorithm based on the BLIS realization of the gemm kernel that avoids the use of
the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed
algorithm eliminates the cost of the explicit im2col transform, while maintaining the portability
and performance of the underlying realization of gemm in BLIS. [-]
Descripción
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) celebrat a Porto, del 9 al 11 de setembre de 2020
Proyecto de investigación
info:eu-repo/grantAgreement/MICIU/TIN2017-82972-Rinfo:eu-repo/grantAgreement/GVA/Prometeo-2019/109
info:eu-repo/grantAgreement/GVA/CDEIGENT/2018/014