Listar por tema "high performance"

Automatic generation of ARM NEON micro‑kernels for matrix multiplication

Alaejos, Guillermo; Martínez, Héctor; Castelló, Adrián; Dolz, Manuel F.; Igual, Francisco; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2024-03-12)

General matrix multiplication (gemm) is a fundamental kernel in scientifc computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel ...

Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor

Dolz, Manuel F.; Martínez, Héctor; Alonso, Pedro; Quintana-Orti, Enrique S. IEEE (2022)

The convolution operator is a crucial kernel for many computer vision and signal processing applications that rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received ...

Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

Barrachina Mir, Sergio; Dolz, Manuel F.; San Juan, Pablo; Quintana-Orti, Enrique S. Elsevier (2022-05-30)

Convolutional Neural Networks (CNNs) play a crucial role in many image recognition and classification tasks, recommender systems, brain-computer interfaces, etc. As a consequence, there is a notable interest in developing ...

Efficient and portable Winograd convolutions for multi-core processors

Dolz, Manuel F.; Martínez, Héctor; Castelló, Adrián; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2023-02-12)

We take a step forward towards developing high-performance codes for the convolution operator, based on the Winograd algorithm, that are easy to customise for general-purpose processor architectures. In our approach, ...

Exploiting the capabilities of modern GPUs for dense matrix computations

Barrachina Mir, Sergio; Castillo Catalán, María Isabel; Igual, Francisco; Mayo, Rafael; Quintana-Orti, Enrique S.; Quintana-Ortí, Gregorio John Wiley & Sons (2009)

We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. ...

Exploring the performance–power–energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing

León Navarro, Germán; Molero, Jose M.; Garzon, E.M.; García, I.; Plaza, Antonio; Quintana-Orti, Enrique S. Springer Verlag (2015)

In this paper, we perform an experimental study of the interactions between execution time (i.e., performance), power, and energy that occur in modern low-power architectures when executing the RX algorithm for detecting ...

Repositori Universitat Jaume I