Mostrar el registro sencillo del ítem

dc.contributor.authorCastelló, Adrián
dc.contributor.authorBarrachina Mir, Sergio
dc.contributor.authorDolz, Manuel F.
dc.contributor.authorQuintana-Orti, Enrique S.
dc.contributor.authorSan Juan, Pau
dc.contributor.authorTomás Domínguez, Andrés Enrique
dc.date.accessioned2022-05-24T13:04:49Z
dc.date.available2022-05-24T13:04:49Z
dc.date.issued2022-03-22
dc.identifier.citationCastelló, A., Barrachina, S., Dolz, M. F., Quintana-Ortí, E. S., San Juan, P., & Tomás, A. E. (2022). High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS. Journal of Systems Architecture, 125, 102459.ca_CA
dc.identifier.issn1383-7621
dc.identifier.urihttp://hdl.handle.net/10234/197784
dc.description.abstractWe evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and development of micro-kernels for the matrix multiplication, vectorized with ARM’s NEON intrinsics, that can accommodate layer fusion; and the appropriate selection of several cache configuration parameters tailored to the memory hierarchy of the target ARM processors. Our experiments evaluate both inference throughput (measured in processed images/s) and inference latency (i.e., time-to-response) as well as energy consumption per image when varying the level of thread parallelism and the processor power modes. The experiments with the new inference engine are reported for the ResNet50 v1.5 model on the ImageNet dataset from the MLPerf suite using the ARM v8.2 cores in the NVIDIA Jetson AGX Xavier board. These results show superior performance compared with the well-spread TFLite from Google and slightly inferior results when compared with ArmNN, the native library from ARM for DNN inference.ca_CA
dc.format.extent9 p.ca_CA
dc.format.mimetypeapplication/pdfca_CA
dc.language.isoengca_CA
dc.publisherElsevierca_CA
dc.publisherNorth-Hollandca_CA
dc.relation.isPartOfJournal of Systems Architecture. 125 (2022) 102459ca_CA
dc.rights1383-7621/© 2022 The Authors. Published by Elsevier B.V.ca_CA
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/ca_CA
dc.subjectconvolutional neural networkca_CA
dc.subjectinferenceca_CA
dc.subjectmulticore low-power processorsca_CA
dc.titleHigh performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLISca_CA
dc.typeinfo:eu-repo/semantics/articleca_CA
dc.identifier.doihttps://doi.org/10.1016/j.sysarc.2022.102459
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca_CA
dc.type.versioninfo:eu-repo/semantics/publishedVersionca_CA
project.funder.nameMinisterio de Ciencia, Innovación y Universidades (Spain)ca_CA
project.funder.nameGeneralitat Valencianaca_CA
oaire.awardNumberTIN2017-82972-Rca_CA
oaire.awardNumberPrometeo/2019/109ca_CA
oaire.awardNumberFJC2019-039222-Ica_CA
oaire.awardNumberCDEIGENT/2018/014ca_CA


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

1383-7621/© 2022 The Authors. Published by Elsevier B.V.
Excepto si se señala otra cosa, la licencia del ítem se describe como: 1383-7621/© 2022 The Authors. Published by Elsevier B.V.