Optimising Convolutions for Deep Learning Inference On ARM Cortex-M Processors
![Thumbnail](/xmlui/bitstream/handle/10234/207311/10.1109JIOT.2024.3395335.pdf.jpg?sequence=4&isAllowed=y)
Visualitza/
Impacte
![Google Scholar](/xmlui/themes/Mirage2/images/uji/logo_google.png)
![Microsoft Academico](/xmlui/themes/Mirage2/images/uji/logo_microsoft.png)
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
Optimising Convolutions for Deep Learning Inference On ARM Cortex-M ProcessorsData de publicació
2024-04-30Editor
Institute of Electrical and Electronics Engineers Inc.ISSN
2327-4662Cita bibliogràfica
Maciá, A., Barrachina Mir, S., Fabregat Llueca, G., & Dolz, M. F. (2024). “Optimising Convolutions for Deep Learning Inference on ARM Cortex-M Processors”. in IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2024.3395335Tipus de document
info:eu-repo/semantics/articleVersió de l'editorial
https://ieeexplore.ieee.org/document/10513367Versió
info:eu-repo/semantics/publishedVersionParaules clau / Matèries
Resum
We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 ... [+]
We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 microcontrollers. To this end, we develop custom microkernels that efficiently handle the internal computations required by the convolution operator via the lowering approach and the direct method, and we design two techniques to avoid register spilling. We also take advantage of all the RAM on the Arduino boards by reusing it as a scratchpad for the convolution filters. The integration of these techniques into CMSIS-NN, when invoked by TensorFlow Lite for microcontrollers for quantised versions of VGG, SqueezeNet, ResNet, and MobileNet-like convolutional neural networks enhances the overall inference speed by a factor ranging from 1.13× to 1.50×. [-]
Publicat a
IEEE Internet of Things Journal, 2024Entitat finançadora
European Union NextGenerationEU
Codi del projecte o subvenció
TED2021-129334B
Drets d'accés
info:eu-repo/semantics/openAccess
Apareix a les col.leccions
- ICC_Articles [423]