• openAccess   Automatic generation of ARM NEON micro‑kernels for matrix multiplication 

      Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)
      General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...
    • openAccess   Efficient and portable Winograd convolutions for multi-core processors 

      Dolz, Manuel F.; Martínez, Héctor; Castelló, Adrián; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2023-02-12)
      We take a step forward towards developing high-performance codes for the convolution operator, based on the Winograd algorithm, that are easy to customise for general-purpose processor architectures. In our approach, ...
    • openAccess   Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors 

      Dolz, Manuel F.; Barrachina Mir, Sergio; Martínez, Héctor; Castelló, Adrián; Maciá, Antonio; Fabregat Llueca, German; Tomás, Andrés E. Springer (2023)
      In this work, we assess the performance and energy efciency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) ...