Listar ICC_Articles por autoría "511c94cb-8547-4534-af1b-edcfd848481f"

Evaluating the performance and energy efficiency of the COSMO-ART model system

Charles, Joseph; Sawyer, William; Dolz, Manuel F.; Catalán, Sandra Springer Berlin Heidelberg (2015-05)

In this paper we investigate the energy footprint and performance profiling of COSMO-ART on various HPC platforms. This model is an extension of the operational weather forecast model of the German weather service (DWD), ...

Exploring stream parallel patterns in distributed MPI environments

López-Gómez, Javier; Fernández Muñoz, Javier; del Río Astorga, David; Dolz, Manuel F.; García, J. Daniel Elsevier (2019)

In recent years, the large volumes of stream data and the near real-time requirements of data streaming applications have exacerbated the need for new scalable algorithms and programming interfaces for distributed and ...

Finding parallel patterns through static analysis in C++ applications

del Río Astorga, David; Dolz, Manuel F.; Sánchez García, Luis Miguel; García, J. Daniel; DANELUTTO, MARCO; Torquati, Massimo Sage (2017-03)

Since the ‘free lunch’ of processor performance is over, parallelism has become the new trend in hardware and architecture design. However, parallel resources deployed in data centers are underused in many cases, given ...

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

Castelló, Adrián; Barrachina Mir, Sergio; Dolz, Manuel F.; Quintana-Orti, Enrique S.; San Juan, Pau; Tomás Domínguez, Andrés Enrique Elsevier (2022-03-22)

We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors ...

Hybrid static–dynamic selection of implementation alternatives in heterogeneous environments

del Río Astorga, David; Dolz, Manuel F.; Fernández Muñoz, Javier; García Blas, Javier Springer (2019-09)

With the emergence of heterogeneous architectures, developing parallel software has become an increasingly complex task. The ability of using multiple devices in a single application, such as CPUs, accelerators, or ...

Modeling power and energy consumption of dense matrix factorizations on multicore processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2013-10-11)

In this paper, we propose a model for the energy consumption of the concurrent execution of three key dense matrix factorizations, with task parallelism leveraged via the Symmetric Multi-Processing Superscalar (SMPSs) ...

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors

Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2014-05)

In this paper we introduce a model for the total energy consumption of the Cholesky factorization on a multicore processor. Our model assumes a task-parallel execution of the factorization process, with concurrency leveraged ...

Optimising Convolutions for Deep Learning Inference On ARM Cortex-M Processors

Maciá-Lillo, Antonio; Barrachina Mir, Sergio; Fabregat Llueca, German; Dolz, Manuel F. Institute of Electrical and Electronics Engineers Inc. (2024-04-30)

We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 microcontrollers. ...

Paving the way towards high-level parallel pattern interfaces for data stream processing

del Río Astorga, David; Dolz, Manuel F.; Fernández Muñoz, Javier; García, J. Daniel Elsevier (2018-10)

The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, ...

Performance modeling of the sparse matrix–vector product via convolutional neural networks

Barreda Vayá, Maria; Dolz, Manuel F.; Castaño Álvarez, María Asunción; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2020-02-04)

Modeling the execution time of the sparse matrix–vector multiplication (SpMV) on a current CPU architecture is especially complex due to (i) irregular memory accesses; (ii) indirect memory referencing; and (iii) low ...

Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors

Dolz, Manuel F.; Barrachina Mir, Sergio; Martínez, Héctor; Castelló, Adrián; Maciá, Antonio; Fabregat Llueca, German; Tomás, Andrés E. Springer (2023)

In this work, we assess the performance and energy efciency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) ...

PyDTNN: A user-friendly and extensible framework for distributed deep learning

Barrachina Mir, Sergio; Castelló, Adrián; Catalán Carbó, Mar; Dolz, Manuel F.; Mestre Miravet, Jose Ignacio Springer (2021-02-22)

We introduce a framework for training deep neural networks on clusters of computers with the following appealing properties: (1) It is developed in Python, exposing an amiable interface that provides an accessible entry ...

Reformulating the direct convolution for high-performance deep learning inference on ARM processors

Barrachina Mir, Sergio; Castelló, Adrián; Dolz, Manuel F.; Low, Tze Meng; Martinez, Hector; Quintana-Orti, Enrique S.; Upasana, Sridhar; Tomás Domínguez, Andrés Enrique Elsevier (2022-12-20)

We present two high-performance implementations of the convolution operator via the direct algorithm that outperform the so-called lowering approach based on the im2col transform plus the gemm kernel on an ARMv8-based ...

Towards Automatic Parallelization of Stream Processing Applications

Dolz, Manuel F.; del Río Astorga, David; Fernández Muñoz, Javier; García, J. Daniel; Carretero, Jesús IEEE (2018-08)

Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover ...

Urban sound classifcation using neural networks on embedded FPGAs

BELLOCH, JOSE A.; Coronado, Raul; Valls, Oscar; Amor, Rocío del; Leon, German; Naranjo, Valery; Dolz, Manuel F.; Amor-Martin, Adrian; Piñero, Gema Springer (2024-03-01)

Sound classifcation using neural networks has recently produced very accurate results. A large number of diferent applications use this type of sound classifers such as controlling and monitoring the type of activity in ...

Urban sound classification using neural networks on embedded FPGAs

BELLOCH, JOSE A.; Coronado, Raul; Valls, Oscar; Amor, Rocío del; Leon, German; Naranjo, Valery; Dolz, Manuel F.; Amor-Martin, Adrian; Piñero, Gema Springer (2024-03-01)

Sound classification using neural networks has recently produced very accurate results. A large number of different applications use this type of sound classifiers such as controlling and monitoring the type of activity ...

Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs

Barrachina Mir, Sergio; Castelló, Adrián; Catalán Carbó, Mar; Dolz, Manuel F.; Mestre Miravet, Jose Ignacio Springer (2021-08-30)

In this work, we build a general piece-wise model to analyze data-parallel (DP) training costs of convolutional neural networks (CNNs) on clusters of GPUs. This general model is based on i) multi-layer perceptrons (MLPs) ...