High Performance and Portable Convolution Operators for Multicore Processors
View/ Open
Impact
Scholar |
Other documents of the author: San Juan, Pablo; Castelló, Adrián; Dolz, Manuel F.; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S.
Metadata
Show full item recordcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/146069
comunitat-uji-handle4:
INVESTIGACIONMetadata
Title
High Performance and Portable Convolution Operators for Multicore ProcessorsAuthor (s)
Date
2020-10Publisher
IEEEISSN
2643-3001Bibliographic citation
P. San Juan, A. Castelló, M. F. Dolz, P. Alonso-Jordá and E. S. Quintana-Ortí, "High Performance and Portable Convolution Operators for Multicore Processors," 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto, Portugal, 2020, pp. 91-98, doi: 10.1109/SBAC-PAD49847.2020.00023.Type
info:eu-repo/semantics/conferenceObjectPublisher version
https://ieeexplore.ieee.org/document/9235053Version
info:eu-repo/semantics/submittedVersionAbstract
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution operator present in this type of ... [+]
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence
tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the im2col transform
followed by a general matrix multiplication (gemm) in order to take advantage of the highly
optimized realizations of the gemm kernel in many linear algebra libraries. The main problems
of this approach are 1) the large memory workspace required to host the intermediate matrices
generated by the im2col transform; and 2) the time to perform the im2col transform, which
is not negligible for complex neural networks. This paper presents a portable high performance
convolution algorithm based on the BLIS realization of the gemm kernel that avoids the use of
the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed
algorithm eliminates the cost of the explicit im2col transform, while maintaining the portability
and performance of the underlying realization of gemm in BLIS. [-]
Description
Ponència presentada a 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) celebrat a Porto, del 9 al 11 de setembre de 2020
Investigation project
info:eu-repo/grantAgreement/MICIU/TIN2017-82972-Rinfo:eu-repo/grantAgreement/GVA/Prometeo-2019/109
info:eu-repo/grantAgreement/GVA/CDEIGENT/2018/014